2025-12-04T08:56:27.9303915Z Current runner version: '2.330.0'
2025-12-04T08:56:27.9311045Z Runner name: 'i-035b9d8fd6b020edf'
2025-12-04T08:56:27.9312026Z Runner group name: 'Default'
2025-12-04T08:56:27.9313183Z Machine name: 'ip-10-1-59-14'
2025-12-04T08:56:27.9316219Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T08:56:27.9318672Z Contents: read
2025-12-04T08:56:27.9319400Z Metadata: read
2025-12-04T08:56:27.9319949Z ##[endgroup]
2025-12-04T08:56:27.9322915Z Secret source: Actions
2025-12-04T08:56:27.9323839Z Prepare workflow directory
2025-12-04T08:56:27.9885370Z Prepare all required actions
2025-12-04T08:56:27.9930335Z Getting action download info
2025-12-04T08:56:28.3751928Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T08:56:30.7325129Z Download action repository 'pytorch/pytorch@main' (SHA:eabb7ad2128580ef674446027b95bcf4e21e8df3)
2025-12-04T08:56:46.9975104Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065)
2025-12-04T08:56:47.3103079Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T08:56:47.5363353Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T08:56:47.7522408Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T08:56:48.0184644Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T08:56:48.3114650Z Getting action download info
2025-12-04T08:56:48.4457855Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T08:56:48.7399006Z Getting action download info
2025-12-04T08:56:48.8632110Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T08:56:49.0727503Z Getting action download info
2025-12-04T08:56:49.1864606Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2025-12-04T08:56:49.3727313Z Getting action download info
2025-12-04T08:56:49.5487425Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T08:56:49.5491427Z ##[group] Inputs
2025-12-04T08:56:49.5491821Z   build-environment: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T08:56:49.5503707Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T08:56:49.5515929Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:56:49.5516826Z   sync-tag: 
2025-12-04T08:56:49.5517659Z   timeout-minutes: 360
2025-12-04T08:56:49.5517920Z   use-gha: 
2025-12-04T08:56:49.5518159Z   dashboard-tag: 
2025-12-04T08:56:49.5518428Z   s3-bucket: gha-artifacts
2025-12-04T08:56:49.5518709Z   aws-role-to-assume: 
2025-12-04T08:56:49.5519311Z   disable-monitor: false
2025-12-04T08:56:49.5519628Z   monitor-log-interval: 5
2025-12-04T08:56:49.5519939Z   monitor-data-collect-interval: 1
2025-12-04T08:56:49.5520255Z ##[endgroup]
2025-12-04T08:56:49.5521052Z Complete job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T08:56:49.6230751Z A job started hook has been configured by the self-hosted runner administrator
2025-12-04T08:56:49.6338801Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh'
2025-12-04T08:56:49.6348723Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:49.6349515Z ##[endgroup]
2025-12-04T08:56:51.2147810Z Runner Type: lf.linux.g4dn.12xlarge.nvidia.gpu
2025-12-04T08:56:51.2148439Z Instance Type: g4dn.12xlarge
2025-12-04T08:56:51.2148856Z AMI Name: unknown
2025-12-04T08:56:51.2178343Z AMI ID: ami-08982f1c5bf93d976
2025-12-04T08:56:56.7030450Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2025-12-04T08:56:56.7030967Z with:
2025-12-04T08:56:56.7031625Z   github-secret: ***
2025-12-04T08:56:56.7032479Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2025-12-04T08:56:56.7033431Z   activate-with-label: false
2025-12-04T08:56:56.7033727Z   label: with-ssh
2025-12-04T08:56:56.7034041Z   remove-existing-keys: true
2025-12-04T08:56:56.7034335Z   fail-silently: true
2025-12-04T08:56:56.7034587Z env:
2025-12-04T08:56:56.7034812Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:56.7035092Z ##[endgroup]
2025-12-04T08:56:56.8257517Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info.
2025-12-04T08:56:56.8258911Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys
2025-12-04T08:56:56.8428723Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T08:56:56.8429414Z with:
2025-12-04T08:56:56.8429672Z   no-sudo: true
2025-12-04T08:56:56.8429932Z   submodules: recursive
2025-12-04T08:56:56.8430244Z   fetch-depth: 0
2025-12-04T08:56:56.8430509Z env:
2025-12-04T08:56:56.8430744Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:56.8431063Z ##[endgroup]
2025-12-04T08:56:56.8508434Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T08:56:56.8509697Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T08:56:56.8518769Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:56.8519172Z env:
2025-12-04T08:56:56.8519414Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:56.8519727Z ##[endgroup]
2025-12-04T08:56:56.8607436Z ##[group]Run # Use all available CPUs for fetching
2025-12-04T08:56:56.8608033Z [36;1m# Use all available CPUs for fetching[0m
2025-12-04T08:56:56.8608548Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T08:56:56.8609006Z [36;1mgit config --global fetch.parallel 0[0m
2025-12-04T08:56:56.8609554Z [36;1mgit config --global submodule.fetchJobs 0[0m
2025-12-04T08:56:56.8610037Z [36;1m[0m
2025-12-04T08:56:56.8610515Z [36;1m# Clean workspace. The default checkout action should also do this, but[0m
2025-12-04T08:56:56.8611203Z [36;1m# do it here as well just in case[0m
2025-12-04T08:56:56.8611615Z [36;1mif [[ -d .git ]]; then[0m
2025-12-04T08:56:56.8612020Z [36;1m  if [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T08:56:56.8612524Z [36;1m    sudo git clean -ffdx[0m
2025-12-04T08:56:56.8612892Z [36;1m  else[0m
2025-12-04T08:56:56.8613234Z [36;1m    git clean -ffdx[0m
2025-12-04T08:56:56.8613683Z [36;1m  fi[0m
2025-12-04T08:56:56.8614020Z [36;1mfi[0m
2025-12-04T08:56:56.8620512Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:56.8621322Z env:
2025-12-04T08:56:56.8621831Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:56.8622243Z   NO_SUDO: true
2025-12-04T08:56:56.8622627Z ##[endgroup]
2025-12-04T08:56:56.8768037Z ##[group]Run actions/checkout@v4
2025-12-04T08:56:56.8768353Z with:
2025-12-04T08:56:56.8768625Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:56:56.8768966Z   fetch-depth: 0
2025-12-04T08:56:56.8769221Z   submodules: recursive
2025-12-04T08:56:56.8769492Z   show-progress: false
2025-12-04T08:56:56.8769757Z   repository: pytorch/pytorch
2025-12-04T08:56:56.8770228Z   token: ***
2025-12-04T08:56:56.8770464Z   ssh-strict: true
2025-12-04T08:56:56.8770710Z   ssh-user: git
2025-12-04T08:56:56.8770951Z   persist-credentials: true
2025-12-04T08:56:56.8771239Z   clean: true
2025-12-04T08:56:56.8771508Z   sparse-checkout-cone-mode: true
2025-12-04T08:56:56.8771812Z   fetch-tags: false
2025-12-04T08:56:56.8772062Z   lfs: false
2025-12-04T08:56:56.8772304Z   set-safe-directory: true
2025-12-04T08:56:56.8772574Z env:
2025-12-04T08:56:56.8772801Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:56.8773086Z ##[endgroup]
2025-12-04T08:56:56.9930959Z Syncing repository: pytorch/pytorch
2025-12-04T08:56:56.9932474Z ##[group]Getting Git version info
2025-12-04T08:56:56.9933155Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T08:56:56.9934017Z [command]/usr/bin/git version
2025-12-04T08:56:56.9934320Z git version 2.50.1
2025-12-04T08:56:56.9941991Z ##[endgroup]
2025-12-04T08:56:56.9953294Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/ce61325c-9bbf-4083-9d68-528b3fba0d16/.gitconfig'
2025-12-04T08:56:56.9973295Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/ce61325c-9bbf-4083-9d68-528b3fba0d16' before making global git config changes
2025-12-04T08:56:56.9976247Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T08:56:56.9981781Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T08:56:57.0015717Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T08:56:57.0019456Z ##[group]Initializing the repository
2025-12-04T08:56:57.0023827Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T08:56:57.0055628Z hint: Using 'master' as the name for the initial branch. This default branch name
2025-12-04T08:56:57.0056538Z hint: is subject to change. To configure the initial branch name to use in all
2025-12-04T08:56:57.0057221Z hint: of your new repositories, which will suppress this warning, call:
2025-12-04T08:56:57.0057913Z hint:
2025-12-04T08:56:57.0058279Z hint: 	git config --global init.defaultBranch <name>
2025-12-04T08:56:57.0058699Z hint:
2025-12-04T08:56:57.0059111Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2025-12-04T08:56:57.0059793Z hint: 'development'. The just-created branch can be renamed via this command:
2025-12-04T08:56:57.0060322Z hint:
2025-12-04T08:56:57.0060592Z hint: 	git branch -m <name>
2025-12-04T08:56:57.0060894Z hint:
2025-12-04T08:56:57.0061332Z hint: Disable this message with "git config set advice.defaultBranchName false"
2025-12-04T08:56:57.0062184Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2025-12-04T08:56:57.0065532Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2025-12-04T08:56:57.0092722Z ##[endgroup]
2025-12-04T08:56:57.0093239Z ##[group]Disabling automatic garbage collection
2025-12-04T08:56:57.0095102Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T08:56:57.0122116Z ##[endgroup]
2025-12-04T08:56:57.0122679Z ##[group]Setting up auth
2025-12-04T08:56:57.0128786Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T08:56:57.0156755Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T08:56:57.0477777Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T08:56:57.0503524Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T08:56:57.0791370Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T08:56:57.0818917Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T08:56:57.1104955Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T08:56:57.1156415Z ##[endgroup]
2025-12-04T08:56:57.1157067Z ##[group]Fetching the repository
2025-12-04T08:56:57.1163158Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T08:57:43.5148977Z From https://github.com/pytorch/pytorch
2025-12-04T08:57:43.5149718Z  * [new branch]              2.6.0.dev20241004+          -> origin/2.6.0.dev20241004+
2025-12-04T08:57:43.5150435Z  * [new branch]              2.9.1                       -> origin/2.9.1
2025-12-04T08:57:43.5151105Z  * [new branch]              AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T08:57:43.5151846Z  * [new branch]              Flamefire-patch-1           -> origin/Flamefire-patch-1
2025-12-04T08:57:43.5152537Z  * [new branch]              HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T08:57:43.5153202Z  * [new branch]              HOPrintFunc                 -> origin/HOPrintFunc
2025-12-04T08:57:43.5154807Z  * [new branch]              IvanKobzarev/stack/1        -> origin/IvanKobzarev/stack/1
2025-12-04T08:57:43.5156847Z  * [new branch]              NicoshevSVE128              -> origin/NicoshevSVE128
2025-12-04T08:57:43.5157673Z  * [new branch]              PR-AOTInductorNoneBug       -> origin/PR-AOTInductorNoneBug
2025-12-04T08:57:43.5158896Z  * [new branch]              PR-AOTInductorNoneBugFix    -> origin/PR-AOTInductorNoneBugFix
2025-12-04T08:57:43.5160070Z  * [new branch]              PR-FixConfigsIssue          -> origin/PR-FixConfigsIssue
2025-12-04T08:57:43.5161078Z  * [new branch]              PR-NoneBugFix-viable        -> origin/PR-NoneBugFix-viable
2025-12-04T08:57:43.5162150Z  * [new branch]              PR-ResetToZero              -> origin/PR-ResetToZero
2025-12-04T08:57:43.5163336Z  * [new branch]              Update-Flash-Packaging      -> origin/Update-Flash-Packaging
2025-12-04T08:57:43.5164368Z  * [new branch]              VLA_exp                     -> origin/VLA_exp
2025-12-04T08:57:43.5166848Z  * [new branch]              activation_bench            -> origin/activation_bench
2025-12-04T08:57:43.5167973Z  * [new branch]              addmm-heuristic             -> origin/addmm-heuristic
2025-12-04T08:57:43.5169541Z  * [new branch]              adi/onednn_aarch64          -> origin/adi/onednn_aarch64
2025-12-04T08:57:43.5170601Z  * [new branch]              adi/test                    -> origin/adi/test
2025-12-04T08:57:43.5171725Z  * [new branch]              adi/test_bgemm              -> origin/adi/test_bgemm
2025-12-04T08:57:43.5172837Z  * [new branch]              adi/test_m8g                -> origin/adi/test_m8g
2025-12-04T08:57:43.5173912Z  * [new branch]              adi/test_onednn             -> origin/adi/test_onednn
2025-12-04T08:57:43.5175075Z  * [new branch]              adi/test_onednn_v3.9        -> origin/adi/test_onednn_v3.9
2025-12-04T08:57:43.5176210Z  * [new branch]              adi/test_presve_change      -> origin/adi/test_presve_change
2025-12-04T08:57:43.5177617Z  * [new branch]              adi/test_timm               -> origin/adi/test_timm
2025-12-04T08:57:43.5179159Z  * [new branch]              adi/testpresve_change       -> origin/adi/testpresve_change
2025-12-04T08:57:43.5181119Z  * [new branch]              aditew01/test/vec_bf16      -> origin/aditew01/test/vec_bf16
2025-12-04T08:57:43.5182315Z  * [new branch]              ah-globalfeedback-hook      -> origin/ah-globalfeedback-hook
2025-12-04T08:57:43.5183633Z  * [new branch]              albanD-patch-1              -> origin/albanD-patch-1
2025-12-04T08:57:43.5184701Z  * [new branch]              also-surround-shimh         -> origin/also-surround-shimh
2025-12-04T08:57:43.5186721Z  * [new branch]              angelayi/aot_compile        -> origin/angelayi/aot_compile
2025-12-04T08:57:43.5187931Z  * [new branch]              angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T08:57:43.5188981Z  * [new branch]              angelayi/benchmark          -> origin/angelayi/benchmark
2025-12-04T08:57:43.5190253Z  * [new branch]              angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T08:57:43.5191260Z  * [new branch]              angelayi/cpp_loader         -> origin/angelayi/cpp_loader
2025-12-04T08:57:43.5192924Z  * [new branch]              angelayi/inductor_const     -> origin/angelayi/inductor_const
2025-12-04T08:57:43.5193850Z  * [new branch]              angelayi/lstm               -> origin/angelayi/lstm
2025-12-04T08:57:43.5195373Z  * [new branch]              angelayi/no_so_weight       -> origin/angelayi/no_so_weight
2025-12-04T08:57:43.5196780Z  * [new branch]              angelayi/scan_layers        -> origin/angelayi/scan_layers
2025-12-04T08:57:43.5197892Z  * [new branch]              angelayi/side_eff           -> origin/angelayi/side_eff
2025-12-04T08:57:43.5199102Z  * [new branch]              angelayi/state_dict         -> origin/angelayi/state_dict
2025-12-04T08:57:43.5200326Z  * [new branch]              angelayi/symint_input       -> origin/angelayi/symint_input
2025-12-04T08:57:43.5201598Z  * [new branch]              angelayi/symm_mem           -> origin/angelayi/symm_mem
2025-12-04T08:57:43.5202730Z  * [new branch]              angelayi/test_cpp           -> origin/angelayi/test_cpp
2025-12-04T08:57:43.5203897Z  * [new branch]              angelayi/torch_size         -> origin/angelayi/torch_size
2025-12-04T08:57:43.5205018Z  * [new branch]              annotate_assert             -> origin/annotate_assert
2025-12-04T08:57:43.5206229Z  * [new branch]              annotate_fallback_kernel    -> origin/annotate_fallback_kernel
2025-12-04T08:57:43.5207345Z  * [new branch]              annotation_deepcopy         -> origin/annotation_deepcopy
2025-12-04T08:57:43.5208427Z  * [new branch]              annotation_dynamo           -> origin/annotation_dynamo
2025-12-04T08:57:43.5209591Z  * [new branch]              aot_eager_stack_trace       -> origin/aot_eager_stack_trace
2025-12-04T08:57:43.5210673Z  * [new branch]              aoti-cuda-alloc             -> origin/aoti-cuda-alloc
2025-12-04T08:57:43.5211836Z  * [new branch]              aoti_const_device           -> origin/aoti_const_device
2025-12-04T08:57:43.5212917Z  * [new branch]              aoti_fqn_name_interface     -> origin/aoti_fqn_name_interface
2025-12-04T08:57:43.5214061Z  * [new branch]              aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T08:57:43.5215119Z  * [new branch]              aoti_target_windows         -> origin/aoti_target_windows
2025-12-04T08:57:43.5217496Z  * [new branch]              arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T08:57:43.5218490Z  * [new branch]              async_tp                    -> origin/async_tp
2025-12-04T08:57:43.5219810Z  * [new branch]              atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T08:57:43.5221204Z  * [new branch]              atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T08:57:43.5222534Z  * [new branch]              atalman-patch-2             -> origin/atalman-patch-2
2025-12-04T08:57:43.5223784Z  * [new branch]              atalman-patch-3             -> origin/atalman-patch-3
2025-12-04T08:57:43.5224942Z  * [new branch]              atalman-patch-4             -> origin/atalman-patch-4
2025-12-04T08:57:43.5226171Z  * [new branch]              atalman-patch-5             -> origin/atalman-patch-5
2025-12-04T08:57:43.5227402Z  * [new branch]              atalman-patch-6             -> origin/atalman-patch-6
2025-12-04T08:57:43.5228594Z  * [new branch]              atalman-patch-7             -> origin/atalman-patch-7
2025-12-04T08:57:43.5229802Z  * [new branch]              atalman-patch-8             -> origin/atalman-patch-8
2025-12-04T08:57:43.5231264Z  * [new branch]              atalman_inductor_2.3.1      -> origin/atalman_inductor_2.3.1
2025-12-04T08:57:43.5232344Z  * [new branch]              atalman_inductor_2.4.0      -> origin/atalman_inductor_2.4.0
2025-12-04T08:57:43.5233683Z  * [new branch]              atalman_inductor_2.4.x      -> origin/atalman_inductor_2.4.x
2025-12-04T08:57:43.5234924Z  * [new branch]              attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T08:57:43.5236461Z  * [new branch]              bahuang/dt_fix_scalar_add   -> origin/bahuang/dt_fix_scalar_add
2025-12-04T08:57:43.5237544Z  * [new branch]              bahuang/fix_debug_mode      -> origin/bahuang/fix_debug_mode
2025-12-04T08:57:43.5238600Z  * [new branch]              bahuang/fix_expand          -> origin/bahuang/fix_expand
2025-12-04T08:57:43.5239707Z  * [new branch]              bahuang/test                -> origin/bahuang/test
2025-12-04T08:57:43.5241408Z  * [new branch]              base/1.5                    -> origin/base/1.5
2025-12-04T08:57:43.5242832Z  * [new branch]              batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T08:57:43.5243788Z  * [new branch]              bench_scaled_mm_ops         -> origin/bench_scaled_mm_ops
2025-12-04T08:57:43.5245073Z  * [new branch]              benchmark-updates           -> origin/benchmark-updates
2025-12-04T08:57:43.5246054Z  * [new branch]              benchmarking-script         -> origin/benchmarking-script
2025-12-04T08:57:43.5247619Z  * [new branch]              bertmaher/pinbump26         -> origin/bertmaher/pinbump26
2025-12-04T08:57:43.5249115Z  * [new branch]              bertrand/cutlass            -> origin/bertrand/cutlass
2025-12-04T08:57:43.5250635Z  * [new branch]              bf/bug-static-input         -> origin/bf/bug-static-input
2025-12-04T08:57:43.5251581Z  * [new branch]              bf/cg-backend               -> origin/bf/cg-backend
2025-12-04T08:57:43.5252621Z  * [new branch]              bf/cg-nccl-test             -> origin/bf/cg-nccl-test
2025-12-04T08:57:43.5253716Z  * [new branch]              bf/cg-remove-check          -> origin/bf/cg-remove-check
2025-12-04T08:57:43.5254904Z  * [new branch]              bf/clean-torchbench-hf      -> origin/bf/clean-torchbench-hf
2025-12-04T08:57:43.5255927Z  * [new branch]              bf/combo-debug-log          -> origin/bf/combo-debug-log
2025-12-04T08:57:43.5257296Z  * [new branch]              bf/cudagraph                -> origin/bf/cudagraph
2025-12-04T08:57:43.5259034Z  * [new branch]              bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T08:57:43.5260478Z  * [new branch]              bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T08:57:43.5261538Z  * [new branch]              bf/cudagraph-partition      -> origin/bf/cudagraph-partition
2025-12-04T08:57:43.5262447Z  * [new branch]              bf/donated-buffer-bench     -> origin/bf/donated-buffer-bench
2025-12-04T08:57:43.5263618Z  * [new branch]              bf/dynamo-partition         -> origin/bf/dynamo-partition
2025-12-04T08:57:43.5264738Z  * [new branch]              bf/lite                     -> origin/bf/lite
2025-12-04T08:57:43.5265963Z  * [new branch]              bf/pa-non-divisible         -> origin/bf/pa-non-divisible
2025-12-04T08:57:43.5267215Z  * [new branch]              bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T08:57:43.5268400Z  * [new branch]              bf/partition-memory-plan    -> origin/bf/partition-memory-plan
2025-12-04T08:57:43.5269613Z  * [new branch]              bf/partition-move-cpu       -> origin/bf/partition-move-cpu
2025-12-04T08:57:43.5270808Z  * [new branch]              bf/partition-view-fallback  -> origin/bf/partition-view-fallback
2025-12-04T08:57:43.5271877Z  * [new branch]              bf/remove-check-55b0c39d    -> origin/bf/remove-check-55b0c39d
2025-12-04T08:57:43.5272908Z  * [new branch]              bf/timm-nov-26-2025         -> origin/bf/timm-nov-26-2025
2025-12-04T08:57:43.5274094Z  * [new branch]              bf/transformer-pin-4-57-3   -> origin/bf/transformer-pin-4-57-3
2025-12-04T08:57:43.5275220Z  * [new branch]              bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T08:57:43.5276294Z  * [new branch]              bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T08:57:43.5277353Z  * [new branch]              bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T08:57:43.5278398Z  * [new branch]              bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T08:57:43.5279413Z  * [new branch]              bisect_perf_hf_T5_5268754e  -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T08:57:43.5280489Z  * [new branch]              bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T08:57:43.5281580Z  * [new branch]              bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T08:57:43.5282614Z  * [new branch]              bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T08:57:43.5283664Z  * [new branch]              bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T08:57:43.5284905Z  * [new branch]              bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T08:57:43.5285850Z  * [new branch]              bisect_perf_hf_T5_d65f194a  -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T08:57:43.5286905Z  * [new branch]              bisect_perf_hf_T5_da94ab0b  -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T08:57:43.5288012Z  * [new branch]              bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T08:57:43.5289074Z  * [new branch]              bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T08:57:43.5290097Z  * [new branch]              bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T08:57:43.5291341Z  * [new branch]              bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T08:57:43.5292958Z  * [new branch]              brister/fx_device_type      -> origin/brister/fx_device_type
2025-12-04T08:57:43.5294023Z  * [new branch]              brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T08:57:43.5295206Z  * [new branch]              brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T08:57:43.5296189Z  * [new branch]              bwd-backup                  -> origin/bwd-backup
2025-12-04T08:57:43.5297956Z  * [new branch]              c57382a49                   -> origin/c57382a49
2025-12-04T08:57:43.5298958Z  * [new branch]              ca_0431d47eaa               -> origin/ca_0431d47eaa
2025-12-04T08:57:43.5300054Z  * [new branch]              ca_fix_0431d47eaa           -> origin/ca_fix_0431d47eaa
2025-12-04T08:57:43.5301799Z  * [new branch]              camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T08:57:43.5302975Z  * [new branch]              cccclai-patch-1             -> origin/cccclai-patch-1
2025-12-04T08:57:43.5304332Z  * [new branch]              cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T08:57:43.5305500Z  * [new branch]              cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T08:57:43.5306744Z  * [new branch]              cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T08:57:43.5307894Z  * [new branch]              cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T08:57:43.5309183Z  * [new branch]              cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T08:57:43.5310472Z  * [new branch]              cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T08:57:43.5311582Z  * [new branch]              cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T08:57:43.5312764Z  * [new branch]              cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T08:57:43.5314017Z  * [new branch]              cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T08:57:43.5315217Z  * [new branch]              cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T08:57:43.5316367Z  * [new branch]              cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T08:57:43.5317456Z  * [new branch]              cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T08:57:43.5318602Z  * [new branch]              cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T08:57:43.5319781Z  * [new branch]              cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T08:57:43.5321327Z  * [new branch]              cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T08:57:43.5322577Z  * [new branch]              cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T08:57:43.5323661Z  * [new branch]              cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T08:57:43.5324876Z  * [new branch]              cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T08:57:43.5326105Z  * [new branch]              cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T08:57:43.5327076Z  * [new branch]              cherry_pick_166036_166040   -> origin/cherry_pick_166036_166040
2025-12-04T08:57:43.5328233Z  * [new branch]              cherry_pick_166457          -> origin/cherry_pick_166457
2025-12-04T08:57:43.5329479Z  * [new branch]              cherrypick_166338           -> origin/cherrypick_166338
2025-12-04T08:57:43.5330652Z  * [new branch]              cherrypick_166458           -> origin/cherrypick_166458
2025-12-04T08:57:43.5331757Z  * [new branch]              cherrypick_166586           -> origin/cherrypick_166586
2025-12-04T08:57:43.5332895Z  * [new branch]              cherrypick_166956           -> origin/cherrypick_166956
2025-12-04T08:57:43.5334091Z  * [new branch]              ci_attn                     -> origin/ci_attn
2025-12-04T08:57:43.5335241Z  * [new branch]              codex-testing               -> origin/codex-testing
2025-12-04T08:57:43.5337512Z  * [new branch]              codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T08:57:43.5338522Z  * [new branch]              codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T08:57:43.5339975Z  * [new branch]              codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T08:57:43.5341321Z  * [new branch]              codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T08:57:43.5342253Z  * [new branch]              compatiblpy39util           -> origin/compatiblpy39util
2025-12-04T08:57:43.5343834Z  * [new branch]              cond_hop_device             -> origin/cond_hop_device
2025-12-04T08:57:43.5344908Z  * [new branch]              context_test                -> origin/context_test
2025-12-04T08:57:43.5346722Z  * [new branch]              copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T08:57:43.5348067Z  * [new branch]              cpio/fix_new_ami_tests      -> origin/cpio/fix_new_ami_tests
2025-12-04T08:57:43.5349364Z  * [new branch]              cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T08:57:43.5350823Z  * [new branch]              csl/always_produce_xml      -> origin/csl/always_produce_xml
2025-12-04T08:57:43.5351768Z  * [new branch]              csl/build_test_more_procs   -> origin/csl/build_test_more_procs
2025-12-04T08:57:43.5352822Z  * [new branch]              csl/build_test_more_procs2  -> origin/csl/build_test_more_procs2
2025-12-04T08:57:43.5353932Z  * [new branch]              csl/clean_up                -> origin/csl/clean_up
2025-12-04T08:57:43.5355313Z  * [new branch]              csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T08:57:43.5356287Z  * [new branch]              csl/katex                   -> origin/csl/katex
2025-12-04T08:57:43.5358144Z  * [new branch]              csl/larger_runner           -> origin/csl/larger_runner
2025-12-04T08:57:43.5359551Z  * [new branch]              csl/lint_testing            -> origin/csl/lint_testing
2025-12-04T08:57:43.5360857Z  * [new branch]              csl/lint_thing              -> origin/csl/lint_thing
2025-12-04T08:57:43.5361996Z  * [new branch]              csl/lintrunner_stuff        -> origin/csl/lintrunner_stuff
2025-12-04T08:57:43.5363150Z  * [new branch]              csl/manually_gen_json       -> origin/csl/manually_gen_json
2025-12-04T08:57:43.5364319Z  * [new branch]              csl/mps_sharding            -> origin/csl/mps_sharding
2025-12-04T08:57:43.5365427Z  * [new branch]              csl/multistage_docker       -> origin/csl/multistage_docker
2025-12-04T08:57:43.5366500Z  * [new branch]              csl/print_timing            -> origin/csl/print_timing
2025-12-04T08:57:43.5367602Z  * [new branch]              csl/remove_experiment       -> origin/csl/remove_experiment
2025-12-04T08:57:43.5368751Z  * [new branch]              csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T08:57:43.5369936Z  * [new branch]              csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T08:57:43.5370988Z  * [new branch]              csl/remove_run_parallel     -> origin/csl/remove_run_parallel
2025-12-04T08:57:43.5371993Z  * [new branch]              csl/remove_unused_vars      -> origin/csl/remove_unused_vars
2025-12-04T08:57:43.5373139Z  * [new branch]              csl/revert_open             -> origin/csl/revert_open
2025-12-04T08:57:43.5374236Z  * [new branch]              csl/skip_build              -> origin/csl/skip_build
2025-12-04T08:57:43.5375352Z  * [new branch]              csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T08:57:43.5376429Z  * [new branch]              csl/td_job_level            -> origin/csl/td_job_level
2025-12-04T08:57:43.5377918Z  * [new branch]              csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T08:57:43.5379246Z  * [new branch]              csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T08:57:43.5380338Z  * [new branch]              csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T08:57:43.5381411Z  * [new branch]              csl/upload_json_running     -> origin/csl/upload_json_running
2025-12-04T08:57:43.5382509Z  * [new branch]              csl/win_sccache             -> origin/csl/win_sccache
2025-12-04T08:57:43.5383597Z  * [new branch]              csl/xml_stuff               -> origin/csl/xml_stuff
2025-12-04T08:57:43.5409636Z  * [new branch]              cublasrelax2                -> origin/cublasrelax2
2025-12-04T08:57:43.5410265Z  * [new branch]              cuda_mempool                -> origin/cuda_mempool
2025-12-04T08:57:43.5410855Z  * [new branch]              custom_lowering_dict        -> origin/custom_lowering_dict
2025-12-04T08:57:43.5411520Z  * [new branch]              d4l3k/debug_plane_frtrace   -> origin/d4l3k/debug_plane_frtrace
2025-12-04T08:57:43.5412142Z  * [new branch]              daxia6/2.8o3                -> origin/daxia6/2.8o3
2025-12-04T08:57:43.5412699Z  * [new branch]              debug-guard                 -> origin/debug-guard
2025-12-04T08:57:43.5413277Z  * [new branch]              delete-quant-docs           -> origin/delete-quant-docs
2025-12-04T08:57:43.5414389Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T08:57:43.5415932Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T08:57:43.5417332Z  * [new branch]              desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T08:57:43.5418140Z  * [new branch]              desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T08:57:43.5418931Z  * [new branch]              dev/dhruva/flex_attn_opt    -> origin/dev/dhruva/flex_attn_opt
2025-12-04T08:57:43.5419665Z  * [new branch]              dev/joona/MPSNDArrayAdd     -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T08:57:43.5420408Z  * [new branch]              dev/joona/Unranked          -> origin/dev/joona/Unranked
2025-12-04T08:57:43.5421422Z  * [new branch]              dev/joona/cat               -> origin/dev/joona/cat
2025-12-04T08:57:43.5422155Z  * [new branch]              dev/joona/embeddingbag      -> origin/dev/joona/embeddingbag
2025-12-04T08:57:43.5422858Z  * [new branch]              dev/joona/fix_sdpa_memtest  -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T08:57:43.5423603Z  * [new branch]              dev/joona/getTensorsString  -> origin/dev/joona/getTensorsString
2025-12-04T08:57:43.5424334Z  * [new branch]              dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T08:57:43.5425042Z  * [new branch]              dev/joona/scalar_clamp      -> origin/dev/joona/scalar_clamp
2025-12-04T08:57:43.5425689Z  * [new branch]              dev/joona/sdpa              -> origin/dev/joona/sdpa
2025-12-04T08:57:43.5426309Z  * [new branch]              dev/joona/sdpa_api          -> origin/dev/joona/sdpa_api
2025-12-04T08:57:43.5426928Z  * [new branch]              dev/joona/type_inf          -> origin/dev/joona/type_inf
2025-12-04T08:57:43.5427620Z  * [new branch]              dev/joona/ulpAssertClose    -> origin/dev/joona/ulpAssertClose
2025-12-04T08:57:43.5428307Z  * [new branch]              dev/joona/upsize3d          -> origin/dev/joona/upsize3d
2025-12-04T08:57:43.5428898Z  * [new branch]              disp_counter                -> origin/disp_counter
2025-12-04T08:57:43.5429507Z  * [new branch]              divyanshk-patch-1           -> origin/divyanshk-patch-1
2025-12-04T08:57:43.5430097Z  * [new branch]              docs                        -> origin/docs
2025-12-04T08:57:43.5430650Z  * [new branch]              documentation               -> origin/documentation
2025-12-04T08:57:43.5431270Z  * [new branch]              eager_model_benchmarks      -> origin/eager_model_benchmarks
2025-12-04T08:57:43.5431985Z  * [new branch]              embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T08:57:43.5432734Z  * [new branch]              embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T08:57:43.5433554Z  * [new branch]              embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T08:57:43.5434183Z  * [new branch]              eqy-patch-1                 -> origin/eqy-patch-1
2025-12-04T08:57:43.5434741Z  * [new branch]              eqy-patch-2                 -> origin/eqy-patch-2
2025-12-04T08:57:43.5435294Z  * [new branch]              eqy-patch-3                 -> origin/eqy-patch-3
2025-12-04T08:57:43.5435846Z  * [new branch]              eqy-patch-4                 -> origin/eqy-patch-4
2025-12-04T08:57:43.5436704Z  * [new branch]              eqy-patch-5                 -> origin/eqy-patch-5
2025-12-04T08:57:43.5437725Z  * [new branch]              eqy-patch-6                 -> origin/eqy-patch-6
2025-12-04T08:57:43.5439405Z  * [new branch]              exclamaforte/amd-ma         -> origin/exclamaforte/amd-ma
2025-12-04T08:57:43.5440577Z  * [new branch]              exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T08:57:43.5441513Z  * [new branch]              exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T08:57:43.5442636Z  * [new branch]              exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T08:57:43.5443779Z  * [new branch]              exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T08:57:43.5445123Z  * [new branch]              exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T08:57:43.5446568Z  * [new branch]              exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T08:57:43.5447500Z  * [new branch]              exclamaforte/fusion-data    -> origin/exclamaforte/fusion-data
2025-12-04T08:57:43.5448710Z  * [new branch]              exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T08:57:43.5449934Z  * [new branch]              exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T08:57:43.5450911Z  * [new branch]              exclamaforte/gemm-model     -> origin/exclamaforte/gemm-model
2025-12-04T08:57:43.5452174Z  * [new branch]              exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T08:57:43.5453078Z  * [new branch]              exclamaforte/gemm-to-amd    -> origin/exclamaforte/gemm-to-amd
2025-12-04T08:57:43.5454199Z  * [new branch]              exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T08:57:43.5455512Z  * [new branch]              exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T08:57:43.5456632Z  * [new branch]              exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T08:57:43.5458014Z  * [new branch]              exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T08:57:43.5459184Z  * [new branch]              exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T08:57:43.5460355Z  * [new branch]              exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T08:57:43.5461510Z  * [new branch]              exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T08:57:43.5462471Z  * [new branch]              exec                        -> origin/exec
2025-12-04T08:57:43.5463939Z  * [new branch]              experimental-mosaic         -> origin/experimental-mosaic
2025-12-04T08:57:43.5465060Z  * [new branch]              export-D61047529            -> origin/export-D61047529
2025-12-04T08:57:43.5466254Z  * [new branch]              export-D71412006            -> origin/export-D71412006
2025-12-04T08:57:43.5467640Z  * [new branch]              export-D73042989            -> origin/export-D73042989
2025-12-04T08:57:43.5468774Z  * [new branch]              export-D78957093            -> origin/export-D78957093
2025-12-04T08:57:43.5469845Z  * [new branch]              export-D78996107            -> origin/export-D78996107
2025-12-04T08:57:43.5470957Z  * [new branch]              export-D80823877            -> origin/export-D80823877
2025-12-04T08:57:43.5472264Z  * [new branch]              export-D80958642            -> origin/export-D80958642
2025-12-04T08:57:43.5473235Z  * [new branch]              export-D81054193            -> origin/export-D81054193
2025-12-04T08:57:43.5474306Z  * [new branch]              export-D81204584            -> origin/export-D81204584
2025-12-04T08:57:43.5475450Z  * [new branch]              export-D81429090            -> origin/export-D81429090
2025-12-04T08:57:43.5476744Z  * [new branch]              export-D82250826            -> origin/export-D82250826
2025-12-04T08:57:43.5477735Z  * [new branch]              export-D82253817            -> origin/export-D82253817
2025-12-04T08:57:43.5478828Z  * [new branch]              export-D83541846            -> origin/export-D83541846
2025-12-04T08:57:43.5479956Z  * [new branch]              export-D83627170            -> origin/export-D83627170
2025-12-04T08:57:43.5481086Z  * [new branch]              export-D83766701            -> origin/export-D83766701
2025-12-04T08:57:43.5482417Z  * [new branch]              export-D83768878            -> origin/export-D83768878
2025-12-04T08:57:43.5483483Z  * [new branch]              export-D83769447            -> origin/export-D83769447
2025-12-04T08:57:43.5484521Z  * [new branch]              export-D84089824            -> origin/export-D84089824
2025-12-04T08:57:43.5485619Z  * [new branch]              export-D84213020            -> origin/export-D84213020
2025-12-04T08:57:43.5487333Z  * [new branch]              export-D84373821            -> origin/export-D84373821
2025-12-04T08:57:43.5488375Z  * [new branch]              export-D84612194            -> origin/export-D84612194
2025-12-04T08:57:43.5489629Z  * [new branch]              export-D84890985            -> origin/export-D84890985
2025-12-04T08:57:43.5490594Z  * [new branch]              export-D85122326            -> origin/export-D85122326
2025-12-04T08:57:43.5492356Z  * [new branch]              export-D86256198            -> origin/export-D86256198
2025-12-04T08:57:43.5493383Z  * [new branch]              export-D86460608            -> origin/export-D86460608
2025-12-04T08:57:43.5494711Z  * [new branch]              export-D86474796            -> origin/export-D86474796
2025-12-04T08:57:43.5495860Z  * [new branch]              export-D86712396            -> origin/export-D86712396
2025-12-04T08:57:43.5497427Z  * [new branch]              export-D87022129            -> origin/export-D87022129
2025-12-04T08:57:43.5498630Z  * [new branch]              export-D87838959            -> origin/export-D87838959
2025-12-04T08:57:43.5499857Z  * [new branch]              export-D88319437            -> origin/export-D88319437
2025-12-04T08:57:43.5501230Z  * [new branch]              exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T08:57:43.5502332Z  * [new branch]              ezyang-titan-october        -> origin/ezyang-titan-october
2025-12-04T08:57:43.5503462Z  * [new branch]              ezyang-titan-october2       -> origin/ezyang-titan-october2
2025-12-04T08:57:43.5504524Z  * [new branch]              ezyang-war                  -> origin/ezyang-war
2025-12-04T08:57:43.5506175Z  * [new branch]              ezyang/wip-aot-descriptors  -> origin/ezyang/wip-aot-descriptors
2025-12-04T08:57:43.5507204Z  * [new branch]              fa_u8_brgemm                -> origin/fa_u8_brgemm
2025-12-04T08:57:43.5508992Z  * [new branch]              fadeputr/sequence_fbgemm    -> origin/fadeputr/sequence_fbgemm
2025-12-04T08:57:43.5510049Z  * [new branch]              fastmath_baseline           -> origin/fastmath_baseline
2025-12-04T08:57:43.5511617Z  * [new branch]              fbcode/warm                 -> origin/fbcode/warm
2025-12-04T08:57:43.5512864Z  * [new branch]              fca                         -> origin/fca
2025-12-04T08:57:43.5513907Z  * [new branch]              fca2_ca5984c                -> origin/fca2_ca5984c
2025-12-04T08:57:43.5514991Z  * [new branch]              fca5                        -> origin/fca5
2025-12-04T08:57:43.5516597Z  * [new branch]              feature/justknobs-cpp       -> origin/feature/justknobs-cpp
2025-12-04T08:57:43.5517678Z  * [new branch]              feature/numa-forkserver     -> origin/feature/numa-forkserver
2025-12-04T08:57:43.5519775Z  * [new branch]              ffast_math_baseline         -> origin/ffast_math_baseline
2025-12-04T08:57:43.5520931Z  * [new branch]              ffast_math_target           -> origin/ffast_math_target
2025-12-04T08:57:43.5522889Z  * [new branch]              findhao/base_commit         -> origin/findhao/base_commit
2025-12-04T08:57:43.5523969Z  * [new branch]              findhao/base_commit1        -> origin/findhao/base_commit1
2025-12-04T08:57:43.5525088Z  * [new branch]              findhao/multistream2        -> origin/findhao/multistream2
2025-12-04T08:57:43.5526172Z  * [new branch]              findhao/multistream5        -> origin/findhao/multistream5
2025-12-04T08:57:43.5527173Z  * [new branch]              findhao/multistream6        -> origin/findhao/multistream6
2025-12-04T08:57:43.5528263Z  * [new branch]              findhao/operatorbench3      -> origin/findhao/operatorbench3
2025-12-04T08:57:43.5529322Z  * [new branch]              findhao/operatorbench5      -> origin/findhao/operatorbench5
2025-12-04T08:57:43.5530411Z  * [new branch]              findhao/tritonparse         -> origin/findhao/tritonparse
2025-12-04T08:57:43.5531660Z  * [new branch]              fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T08:57:43.5532776Z  * [new branch]              fix-config-ignore           -> origin/fix-config-ignore
2025-12-04T08:57:43.5533997Z  * [new branch]              fix-dict-guard              -> origin/fix-dict-guard
2025-12-04T08:57:43.5535455Z  * [new branch]              fix_addmm_issue             -> origin/fix_addmm_issue
2025-12-04T08:57:43.5536477Z  * [new branch]              fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T08:57:43.5537809Z  * [new branch]              fix_bench_bwd_pass          -> origin/fix_bench_bwd_pass
2025-12-04T08:57:43.5538864Z  * [new branch]              fix_mem_profiler_config     -> origin/fix_mem_profiler_config
2025-12-04T08:57:43.5539970Z  * [new branch]              fix_nvrtc_discovery         -> origin/fix_nvrtc_discovery
2025-12-04T08:57:43.5541049Z  * [new branch]              fix_op_runner               -> origin/fix_op_runner
2025-12-04T08:57:43.5542164Z  * [new branch]              fix_ubn_159469              -> origin/fix_ubn_159469
2025-12-04T08:57:43.5543357Z  * [new branch]              fixes-triage                -> origin/fixes-triage
2025-12-04T08:57:43.5544486Z  * [new branch]              fixflashinfer               -> origin/fixflashinfer
2025-12-04T08:57:43.5545820Z  * [new branch]              flash_decoding_cpu          -> origin/flash_decoding_cpu
2025-12-04T08:57:43.5546837Z  * [new branch]              flex-flash                  -> origin/flex-flash
2025-12-04T08:57:43.5548079Z  * [new branch]              flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T08:57:43.5549211Z  * [new branch]              flex_flash                  -> origin/flex_flash
2025-12-04T08:57:43.5550953Z  * [new branch]              fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T08:57:43.5552029Z  * [new branch]              fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T08:57:43.5553015Z  * [new branch]              forkserver_fix              -> origin/forkserver_fix
2025-12-04T08:57:43.5554088Z  * [new branch]              fsdp2_trace_rules           -> origin/fsdp2_trace_rules
2025-12-04T08:57:43.5555221Z  * [new branch]              fx_cpp                      -> origin/fx_cpp
2025-12-04T08:57:43.5556892Z  * [new branch]              fy/fix-win                  -> origin/fy/fix-win
2025-12-04T08:57:43.5558036Z  * [new branch]              galv-patch-1                -> origin/galv-patch-1
2025-12-04T08:57:43.5559899Z  * [new branch]              galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T08:57:43.5561222Z  * [new branch]              georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T08:57:43.5563576Z  * [new branch]              gh/AlnisM/1/base            -> origin/gh/AlnisM/1/base
2025-12-04T08:57:43.5564602Z  * [new branch]              gh/AlnisM/1/head            -> origin/gh/AlnisM/1/head
2025-12-04T08:57:43.5566565Z  * [new branch]              gh/EikanWang/67/base        -> origin/gh/EikanWang/67/base
2025-12-04T08:57:43.5567611Z  * [new branch]              gh/EikanWang/67/head        -> origin/gh/EikanWang/67/head
2025-12-04T08:57:43.5569837Z  * [new branch]              gh/Gasoonjia/1/base         -> origin/gh/Gasoonjia/1/base
2025-12-04T08:57:43.5570887Z  * [new branch]              gh/Gasoonjia/1/head         -> origin/gh/Gasoonjia/1/head
2025-12-04T08:57:43.5572826Z  * [new branch]              gh/H-Huang/131/base         -> origin/gh/H-Huang/131/base
2025-12-04T08:57:43.5573929Z  * [new branch]              gh/H-Huang/131/head         -> origin/gh/H-Huang/131/head
2025-12-04T08:57:43.5575065Z  * [new branch]              gh/H-Huang/131/orig         -> origin/gh/H-Huang/131/orig
2025-12-04T08:57:43.5576862Z  * [new branch]              gh/H-Huang/132/base         -> origin/gh/H-Huang/132/base
2025-12-04T08:57:43.5577991Z  * [new branch]              gh/H-Huang/132/head         -> origin/gh/H-Huang/132/head
2025-12-04T08:57:43.5579138Z  * [new branch]              gh/H-Huang/132/orig         -> origin/gh/H-Huang/132/orig
2025-12-04T08:57:43.5580750Z  * [new branch]              gh/H-Huang/180/base         -> origin/gh/H-Huang/180/base
2025-12-04T08:57:43.5581941Z  * [new branch]              gh/H-Huang/180/head         -> origin/gh/H-Huang/180/head
2025-12-04T08:57:43.5582934Z  * [new branch]              gh/H-Huang/180/orig         -> origin/gh/H-Huang/180/orig
2025-12-04T08:57:43.5584436Z  * [new branch]              gh/H-Huang/182/base         -> origin/gh/H-Huang/182/base
2025-12-04T08:57:43.5585475Z  * [new branch]              gh/H-Huang/182/head         -> origin/gh/H-Huang/182/head
2025-12-04T08:57:43.5586623Z  * [new branch]              gh/H-Huang/182/orig         -> origin/gh/H-Huang/182/orig
2025-12-04T08:57:43.5588196Z  * [new branch]              gh/H-Huang/226/base         -> origin/gh/H-Huang/226/base
2025-12-04T08:57:43.5589672Z  * [new branch]              gh/H-Huang/226/head         -> origin/gh/H-Huang/226/head
2025-12-04T08:57:43.5590720Z  * [new branch]              gh/H-Huang/226/orig         -> origin/gh/H-Huang/226/orig
2025-12-04T08:57:43.5592219Z  * [new branch]              gh/H-Huang/228/base         -> origin/gh/H-Huang/228/base
2025-12-04T08:57:43.5593267Z  * [new branch]              gh/H-Huang/228/head         -> origin/gh/H-Huang/228/head
2025-12-04T08:57:43.5594340Z  * [new branch]              gh/H-Huang/228/orig         -> origin/gh/H-Huang/228/orig
2025-12-04T08:57:43.5596354Z  * [new branch]              gh/IvanKobzarev/150/base    -> origin/gh/IvanKobzarev/150/base
2025-12-04T08:57:43.5597349Z  * [new branch]              gh/IvanKobzarev/150/head    -> origin/gh/IvanKobzarev/150/head
2025-12-04T08:57:43.5598448Z  * [new branch]              gh/IvanKobzarev/150/orig    -> origin/gh/IvanKobzarev/150/orig
2025-12-04T08:57:43.5600113Z  * [new branch]              gh/IvanKobzarev/157/base    -> origin/gh/IvanKobzarev/157/base
2025-12-04T08:57:43.5601226Z  * [new branch]              gh/IvanKobzarev/157/head    -> origin/gh/IvanKobzarev/157/head
2025-12-04T08:57:43.5602330Z  * [new branch]              gh/IvanKobzarev/157/orig    -> origin/gh/IvanKobzarev/157/orig
2025-12-04T08:57:43.5603950Z  * [new branch]              gh/IvanKobzarev/159/base    -> origin/gh/IvanKobzarev/159/base
2025-12-04T08:57:43.5604995Z  * [new branch]              gh/IvanKobzarev/159/head    -> origin/gh/IvanKobzarev/159/head
2025-12-04T08:57:43.5606101Z  * [new branch]              gh/IvanKobzarev/159/orig    -> origin/gh/IvanKobzarev/159/orig
2025-12-04T08:57:43.5607671Z  * [new branch]              gh/IvanKobzarev/162/base    -> origin/gh/IvanKobzarev/162/base
2025-12-04T08:57:43.5608886Z  * [new branch]              gh/IvanKobzarev/162/head    -> origin/gh/IvanKobzarev/162/head
2025-12-04T08:57:43.5609971Z  * [new branch]              gh/IvanKobzarev/162/orig    -> origin/gh/IvanKobzarev/162/orig
2025-12-04T08:57:43.5611508Z  * [new branch]              gh/IvanKobzarev/163/base    -> origin/gh/IvanKobzarev/163/base
2025-12-04T08:57:43.5612529Z  * [new branch]              gh/IvanKobzarev/163/head    -> origin/gh/IvanKobzarev/163/head
2025-12-04T08:57:43.5613743Z  * [new branch]              gh/IvanKobzarev/163/orig    -> origin/gh/IvanKobzarev/163/orig
2025-12-04T08:57:43.5615374Z  * [new branch]              gh/IvanKobzarev/166/base    -> origin/gh/IvanKobzarev/166/base
2025-12-04T08:57:43.5616494Z  * [new branch]              gh/IvanKobzarev/166/head    -> origin/gh/IvanKobzarev/166/head
2025-12-04T08:57:43.5617892Z  * [new branch]              gh/IvanKobzarev/166/orig    -> origin/gh/IvanKobzarev/166/orig
2025-12-04T08:57:43.5619556Z  * [new branch]              gh/IvanKobzarev/167/base    -> origin/gh/IvanKobzarev/167/base
2025-12-04T08:57:43.5620555Z  * [new branch]              gh/IvanKobzarev/167/head    -> origin/gh/IvanKobzarev/167/head
2025-12-04T08:57:43.5621918Z  * [new branch]              gh/IvanKobzarev/167/orig    -> origin/gh/IvanKobzarev/167/orig
2025-12-04T08:57:43.5623511Z  * [new branch]              gh/IvanKobzarev/168/base    -> origin/gh/IvanKobzarev/168/base
2025-12-04T08:57:43.5624573Z  * [new branch]              gh/IvanKobzarev/168/head    -> origin/gh/IvanKobzarev/168/head
2025-12-04T08:57:43.5625886Z  * [new branch]              gh/IvanKobzarev/168/orig    -> origin/gh/IvanKobzarev/168/orig
2025-12-04T08:57:43.5627323Z  * [new branch]              gh/IvanKobzarev/169/base    -> origin/gh/IvanKobzarev/169/base
2025-12-04T08:57:43.5628426Z  * [new branch]              gh/IvanKobzarev/169/head    -> origin/gh/IvanKobzarev/169/head
2025-12-04T08:57:43.5629553Z  * [new branch]              gh/IvanKobzarev/169/orig    -> origin/gh/IvanKobzarev/169/orig
2025-12-04T08:57:43.5630971Z  * [new branch]              gh/IvanKobzarev/170/base    -> origin/gh/IvanKobzarev/170/base
2025-12-04T08:57:43.5632075Z  * [new branch]              gh/IvanKobzarev/170/head    -> origin/gh/IvanKobzarev/170/head
2025-12-04T08:57:43.5633304Z  * [new branch]              gh/IvanKobzarev/170/orig    -> origin/gh/IvanKobzarev/170/orig
2025-12-04T08:57:43.5635029Z  * [new branch]              gh/IvanKobzarev/171/base    -> origin/gh/IvanKobzarev/171/base
2025-12-04T08:57:43.5636063Z  * [new branch]              gh/IvanKobzarev/171/head    -> origin/gh/IvanKobzarev/171/head
2025-12-04T08:57:43.5637170Z  * [new branch]              gh/IvanKobzarev/171/orig    -> origin/gh/IvanKobzarev/171/orig
2025-12-04T08:57:43.5638804Z  * [new branch]              gh/IvanKobzarev/172/base    -> origin/gh/IvanKobzarev/172/base
2025-12-04T08:57:43.5639899Z  * [new branch]              gh/IvanKobzarev/172/head    -> origin/gh/IvanKobzarev/172/head
2025-12-04T08:57:43.5640994Z  * [new branch]              gh/IvanKobzarev/172/orig    -> origin/gh/IvanKobzarev/172/orig
2025-12-04T08:57:43.5642609Z  * [new branch]              gh/IvanKobzarev/173/base    -> origin/gh/IvanKobzarev/173/base
2025-12-04T08:57:43.5643652Z  * [new branch]              gh/IvanKobzarev/173/head    -> origin/gh/IvanKobzarev/173/head
2025-12-04T08:57:43.5644723Z  * [new branch]              gh/IvanKobzarev/173/orig    -> origin/gh/IvanKobzarev/173/orig
2025-12-04T08:57:43.5646308Z  * [new branch]              gh/IvanKobzarev/174/base    -> origin/gh/IvanKobzarev/174/base
2025-12-04T08:57:43.5647428Z  * [new branch]              gh/IvanKobzarev/174/head    -> origin/gh/IvanKobzarev/174/head
2025-12-04T08:57:43.5648608Z  * [new branch]              gh/IvanKobzarev/174/orig    -> origin/gh/IvanKobzarev/174/orig
2025-12-04T08:57:43.5650109Z  * [new branch]              gh/IvanKobzarev/175/base    -> origin/gh/IvanKobzarev/175/base
2025-12-04T08:57:43.5651261Z  * [new branch]              gh/IvanKobzarev/175/head    -> origin/gh/IvanKobzarev/175/head
2025-12-04T08:57:43.5652339Z  * [new branch]              gh/IvanKobzarev/175/orig    -> origin/gh/IvanKobzarev/175/orig
2025-12-04T08:57:43.5654136Z  * [new branch]              gh/IvanKobzarev/176/base    -> origin/gh/IvanKobzarev/176/base
2025-12-04T08:57:43.5655165Z  * [new branch]              gh/IvanKobzarev/176/head    -> origin/gh/IvanKobzarev/176/head
2025-12-04T08:57:43.5656243Z  * [new branch]              gh/IvanKobzarev/176/orig    -> origin/gh/IvanKobzarev/176/orig
2025-12-04T08:57:43.5658421Z  * [new branch]              gh/IvanKobzarev/177/base    -> origin/gh/IvanKobzarev/177/base
2025-12-04T08:57:43.5659530Z  * [new branch]              gh/IvanKobzarev/177/head    -> origin/gh/IvanKobzarev/177/head
2025-12-04T08:57:43.5660758Z  * [new branch]              gh/IvanKobzarev/177/orig    -> origin/gh/IvanKobzarev/177/orig
2025-12-04T08:57:43.5662435Z  * [new branch]              gh/IvanKobzarev/178/base    -> origin/gh/IvanKobzarev/178/base
2025-12-04T08:57:43.5663568Z  * [new branch]              gh/IvanKobzarev/178/head    -> origin/gh/IvanKobzarev/178/head
2025-12-04T08:57:43.5664708Z  * [new branch]              gh/IvanKobzarev/178/orig    -> origin/gh/IvanKobzarev/178/orig
2025-12-04T08:57:43.5666401Z  * [new branch]              gh/IvanKobzarev/179/base    -> origin/gh/IvanKobzarev/179/base
2025-12-04T08:57:43.5667416Z  * [new branch]              gh/IvanKobzarev/179/head    -> origin/gh/IvanKobzarev/179/head
2025-12-04T08:57:43.5668550Z  * [new branch]              gh/IvanKobzarev/179/orig    -> origin/gh/IvanKobzarev/179/orig
2025-12-04T08:57:43.5670417Z  * [new branch]              gh/IvanKobzarev/180/base    -> origin/gh/IvanKobzarev/180/base
2025-12-04T08:57:43.5671444Z  * [new branch]              gh/IvanKobzarev/180/head    -> origin/gh/IvanKobzarev/180/head
2025-12-04T08:57:43.5672589Z  * [new branch]              gh/IvanKobzarev/180/orig    -> origin/gh/IvanKobzarev/180/orig
2025-12-04T08:57:43.5674390Z  * [new branch]              gh/IvanKobzarev/181/base    -> origin/gh/IvanKobzarev/181/base
2025-12-04T08:57:43.5675512Z  * [new branch]              gh/IvanKobzarev/181/head    -> origin/gh/IvanKobzarev/181/head
2025-12-04T08:57:43.5676644Z  * [new branch]              gh/IvanKobzarev/181/orig    -> origin/gh/IvanKobzarev/181/orig
2025-12-04T08:57:43.5678470Z  * [new branch]              gh/IvanKobzarev/182/base    -> origin/gh/IvanKobzarev/182/base
2025-12-04T08:57:43.5679475Z  * [new branch]              gh/IvanKobzarev/182/head    -> origin/gh/IvanKobzarev/182/head
2025-12-04T08:57:43.5680687Z  * [new branch]              gh/IvanKobzarev/182/orig    -> origin/gh/IvanKobzarev/182/orig
2025-12-04T08:57:43.5682428Z  * [new branch]              gh/IvanKobzarev/183/base    -> origin/gh/IvanKobzarev/183/base
2025-12-04T08:57:43.5683518Z  * [new branch]              gh/IvanKobzarev/183/head    -> origin/gh/IvanKobzarev/183/head
2025-12-04T08:57:43.5684663Z  * [new branch]              gh/IvanKobzarev/183/orig    -> origin/gh/IvanKobzarev/183/orig
2025-12-04T08:57:43.5686242Z  * [new branch]              gh/IvanKobzarev/184/base    -> origin/gh/IvanKobzarev/184/base
2025-12-04T08:57:43.5687309Z  * [new branch]              gh/IvanKobzarev/184/head    -> origin/gh/IvanKobzarev/184/head
2025-12-04T08:57:43.5688435Z  * [new branch]              gh/IvanKobzarev/184/orig    -> origin/gh/IvanKobzarev/184/orig
2025-12-04T08:57:43.5690295Z  * [new branch]              gh/NikhilAPatel/1/base      -> origin/gh/NikhilAPatel/1/base
2025-12-04T08:57:43.5691440Z  * [new branch]              gh/NikhilAPatel/1/head      -> origin/gh/NikhilAPatel/1/head
2025-12-04T08:57:43.5692831Z  * [new branch]              gh/NikhilAPatel/2/base      -> origin/gh/NikhilAPatel/2/base
2025-12-04T08:57:43.5693821Z  * [new branch]              gh/NikhilAPatel/2/head      -> origin/gh/NikhilAPatel/2/head
2025-12-04T08:57:43.5695621Z  * [new branch]              gh/NikhilAPatel/4/base      -> origin/gh/NikhilAPatel/4/base
2025-12-04T08:57:43.5697104Z  * [new branch]              gh/NikhilAPatel/4/head      -> origin/gh/NikhilAPatel/4/head
2025-12-04T08:57:43.5698742Z  * [new branch]              gh/NikhilAPatel/5/base      -> origin/gh/NikhilAPatel/5/base
2025-12-04T08:57:43.5699838Z  * [new branch]              gh/NikhilAPatel/5/head      -> origin/gh/NikhilAPatel/5/head
2025-12-04T08:57:43.5700999Z  * [new branch]              gh/NikhilAPatel/5/orig      -> origin/gh/NikhilAPatel/5/orig
2025-12-04T08:57:43.5702824Z  * [new branch]              gh/PaliC/17/base            -> origin/gh/PaliC/17/base
2025-12-04T08:57:43.5703862Z  * [new branch]              gh/PaliC/17/head            -> origin/gh/PaliC/17/head
2025-12-04T08:57:43.5705008Z  * [new branch]              gh/PaliC/17/orig            -> origin/gh/PaliC/17/orig
2025-12-04T08:57:43.5706609Z  * [new branch]              gh/PaliC/18/base            -> origin/gh/PaliC/18/base
2025-12-04T08:57:43.5707658Z  * [new branch]              gh/PaliC/18/head            -> origin/gh/PaliC/18/head
2025-12-04T08:57:43.5708864Z  * [new branch]              gh/PaliC/18/orig            -> origin/gh/PaliC/18/orig
2025-12-04T08:57:43.5710429Z  * [new branch]              gh/PaliC/20/base            -> origin/gh/PaliC/20/base
2025-12-04T08:57:43.5711563Z  * [new branch]              gh/PaliC/20/head            -> origin/gh/PaliC/20/head
2025-12-04T08:57:43.5712594Z  * [new branch]              gh/PaliC/20/orig            -> origin/gh/PaliC/20/orig
2025-12-04T08:57:43.5714136Z  * [new branch]              gh/PaliC/21/base            -> origin/gh/PaliC/21/base
2025-12-04T08:57:43.5715139Z  * [new branch]              gh/PaliC/21/head            -> origin/gh/PaliC/21/head
2025-12-04T08:57:43.5716356Z  * [new branch]              gh/PaliC/21/orig            -> origin/gh/PaliC/21/orig
2025-12-04T08:57:43.5717635Z  * [new branch]              gh/PaliC/23/base            -> origin/gh/PaliC/23/base
2025-12-04T08:57:43.5718698Z  * [new branch]              gh/PaliC/23/head            -> origin/gh/PaliC/23/head
2025-12-04T08:57:43.5719816Z  * [new branch]              gh/PaliC/23/orig            -> origin/gh/PaliC/23/orig
2025-12-04T08:57:43.5721760Z  * [new branch]              gh/PaliC/24/base            -> origin/gh/PaliC/24/base
2025-12-04T08:57:43.5722860Z  * [new branch]              gh/PaliC/24/head            -> origin/gh/PaliC/24/head
2025-12-04T08:57:43.5723982Z  * [new branch]              gh/PaliC/24/orig            -> origin/gh/PaliC/24/orig
2025-12-04T08:57:43.5725501Z  * [new branch]              gh/PaliC/25/head            -> origin/gh/PaliC/25/head
2025-12-04T08:57:43.5726567Z  * [new branch]              gh/PaliC/25/next            -> origin/gh/PaliC/25/next
2025-12-04T08:57:43.5727727Z  * [new branch]              gh/PaliC/25/orig            -> origin/gh/PaliC/25/orig
2025-12-04T08:57:43.5729271Z  * [new branch]              gh/PaliC/26/head            -> origin/gh/PaliC/26/head
2025-12-04T08:57:43.5730160Z  * [new branch]              gh/PaliC/26/next            -> origin/gh/PaliC/26/next
2025-12-04T08:57:43.5731311Z  * [new branch]              gh/PaliC/26/orig            -> origin/gh/PaliC/26/orig
2025-12-04T08:57:43.5732855Z  * [new branch]              gh/PaliC/27/next            -> origin/gh/PaliC/27/next
2025-12-04T08:57:43.5734455Z  * [new branch]              gh/PaliC/28/head            -> origin/gh/PaliC/28/head
2025-12-04T08:57:43.5735405Z  * [new branch]              gh/PaliC/28/next            -> origin/gh/PaliC/28/next
2025-12-04T08:57:43.5736560Z  * [new branch]              gh/PaliC/28/orig            -> origin/gh/PaliC/28/orig
2025-12-04T08:57:43.5738374Z  * [new branch]              gh/PaliC/29/head            -> origin/gh/PaliC/29/head
2025-12-04T08:57:43.5739286Z  * [new branch]              gh/PaliC/29/next            -> origin/gh/PaliC/29/next
2025-12-04T08:57:43.5740412Z  * [new branch]              gh/PaliC/29/orig            -> origin/gh/PaliC/29/orig
2025-12-04T08:57:43.5741979Z  * [new branch]              gh/PaliC/30/head            -> origin/gh/PaliC/30/head
2025-12-04T08:57:43.5742891Z  * [new branch]              gh/PaliC/30/next            -> origin/gh/PaliC/30/next
2025-12-04T08:57:43.5744036Z  * [new branch]              gh/PaliC/30/orig            -> origin/gh/PaliC/30/orig
2025-12-04T08:57:43.5745538Z  * [new branch]              gh/PaliC/31/head            -> origin/gh/PaliC/31/head
2025-12-04T08:57:43.5746674Z  * [new branch]              gh/PaliC/31/next            -> origin/gh/PaliC/31/next
2025-12-04T08:57:43.5747798Z  * [new branch]              gh/PaliC/31/orig            -> origin/gh/PaliC/31/orig
2025-12-04T08:57:43.5749746Z  * [new branch]              gh/PaulZhang12/25/base      -> origin/gh/PaulZhang12/25/base
2025-12-04T08:57:43.5750891Z  * [new branch]              gh/PaulZhang12/25/head      -> origin/gh/PaulZhang12/25/head
2025-12-04T08:57:43.5752064Z  * [new branch]              gh/PaulZhang12/25/orig      -> origin/gh/PaulZhang12/25/orig
2025-12-04T08:57:43.5753639Z  * [new branch]              gh/PaulZhang12/28/base      -> origin/gh/PaulZhang12/28/base
2025-12-04T08:57:43.5754744Z  * [new branch]              gh/PaulZhang12/28/head      -> origin/gh/PaulZhang12/28/head
2025-12-04T08:57:43.5755843Z  * [new branch]              gh/PaulZhang12/28/orig      -> origin/gh/PaulZhang12/28/orig
2025-12-04T08:57:43.5757598Z  * [new branch]              gh/PaulZhang12/31/base      -> origin/gh/PaulZhang12/31/base
2025-12-04T08:57:43.5758625Z  * [new branch]              gh/PaulZhang12/31/head      -> origin/gh/PaulZhang12/31/head
2025-12-04T08:57:43.5759787Z  * [new branch]              gh/PaulZhang12/31/orig      -> origin/gh/PaulZhang12/31/orig
2025-12-04T08:57:43.5762071Z  * [new branch]              gh/PaulZhang12/37/base      -> origin/gh/PaulZhang12/37/base
2025-12-04T08:57:43.5763053Z  * [new branch]              gh/PaulZhang12/37/head      -> origin/gh/PaulZhang12/37/head
2025-12-04T08:57:43.5763881Z  * [new branch]              gh/PaulZhang12/37/orig      -> origin/gh/PaulZhang12/37/orig
2025-12-04T08:57:43.5764999Z  * [new branch]              gh/PaulZhang12/40/base      -> origin/gh/PaulZhang12/40/base
2025-12-04T08:57:43.5766062Z  * [new branch]              gh/PaulZhang12/40/head      -> origin/gh/PaulZhang12/40/head
2025-12-04T08:57:43.5767151Z  * [new branch]              gh/PaulZhang12/40/orig      -> origin/gh/PaulZhang12/40/orig
2025-12-04T08:57:43.5768734Z  * [new branch]              gh/PaulZhang12/42/base      -> origin/gh/PaulZhang12/42/base
2025-12-04T08:57:43.5769749Z  * [new branch]              gh/PaulZhang12/42/head      -> origin/gh/PaulZhang12/42/head
2025-12-04T08:57:43.5771279Z  * [new branch]              gh/PaulZhang12/43/base      -> origin/gh/PaulZhang12/43/base
2025-12-04T08:57:43.5772371Z  * [new branch]              gh/PaulZhang12/43/head      -> origin/gh/PaulZhang12/43/head
2025-12-04T08:57:43.5773461Z  * [new branch]              gh/PaulZhang12/43/orig      -> origin/gh/PaulZhang12/43/orig
2025-12-04T08:57:43.5774880Z  * [new branch]              gh/PaulZhang12/44/base      -> origin/gh/PaulZhang12/44/base
2025-12-04T08:57:43.5775906Z  * [new branch]              gh/PaulZhang12/44/head      -> origin/gh/PaulZhang12/44/head
2025-12-04T08:57:43.5777910Z  * [new branch]              gh/PaulZhang12/45/base      -> origin/gh/PaulZhang12/45/base
2025-12-04T08:57:43.5778916Z  * [new branch]              gh/PaulZhang12/45/head      -> origin/gh/PaulZhang12/45/head
2025-12-04T08:57:43.5780047Z  * [new branch]              gh/PaulZhang12/45/orig      -> origin/gh/PaulZhang12/45/orig
2025-12-04T08:57:43.5781650Z  * [new branch]              gh/PaulZhang12/46/base      -> origin/gh/PaulZhang12/46/base
2025-12-04T08:57:43.5782801Z  * [new branch]              gh/PaulZhang12/46/head      -> origin/gh/PaulZhang12/46/head
2025-12-04T08:57:43.5783946Z  * [new branch]              gh/PaulZhang12/46/orig      -> origin/gh/PaulZhang12/46/orig
2025-12-04T08:57:43.5785578Z  * [new branch]              gh/PaulZhang12/47/base      -> origin/gh/PaulZhang12/47/base
2025-12-04T08:57:43.5786730Z  * [new branch]              gh/PaulZhang12/47/head      -> origin/gh/PaulZhang12/47/head
2025-12-04T08:57:43.5787873Z  * [new branch]              gh/PaulZhang12/47/orig      -> origin/gh/PaulZhang12/47/orig
2025-12-04T08:57:43.5789370Z  * [new branch]              gh/PaulZhang12/48/base      -> origin/gh/PaulZhang12/48/base
2025-12-04T08:57:43.5790414Z  * [new branch]              gh/PaulZhang12/48/head      -> origin/gh/PaulZhang12/48/head
2025-12-04T08:57:43.5791521Z  * [new branch]              gh/PaulZhang12/48/orig      -> origin/gh/PaulZhang12/48/orig
2025-12-04T08:57:43.5793300Z  * [new branch]              gh/SamGinzburg/11/base      -> origin/gh/SamGinzburg/11/base
2025-12-04T08:57:43.5794361Z  * [new branch]              gh/SamGinzburg/11/head      -> origin/gh/SamGinzburg/11/head
2025-12-04T08:57:43.5796367Z  * [new branch]              gh/SherlockNoMad/1/base     -> origin/gh/SherlockNoMad/1/base
2025-12-04T08:57:43.5797441Z  * [new branch]              gh/SherlockNoMad/1/head     -> origin/gh/SherlockNoMad/1/head
2025-12-04T08:57:43.5799001Z  * [new branch]              gh/SherlockNoMad/10/base    -> origin/gh/SherlockNoMad/10/base
2025-12-04T08:57:43.5800061Z  * [new branch]              gh/SherlockNoMad/10/head    -> origin/gh/SherlockNoMad/10/head
2025-12-04T08:57:43.5801264Z  * [new branch]              gh/SherlockNoMad/10/orig    -> origin/gh/SherlockNoMad/10/orig
2025-12-04T08:57:43.5802663Z  * [new branch]              gh/SherlockNoMad/11/base    -> origin/gh/SherlockNoMad/11/base
2025-12-04T08:57:43.5803693Z  * [new branch]              gh/SherlockNoMad/11/head    -> origin/gh/SherlockNoMad/11/head
2025-12-04T08:57:43.5804804Z  * [new branch]              gh/SherlockNoMad/11/orig    -> origin/gh/SherlockNoMad/11/orig
2025-12-04T08:57:43.5806201Z  * [new branch]              gh/SherlockNoMad/12/base    -> origin/gh/SherlockNoMad/12/base
2025-12-04T08:57:43.5807185Z  * [new branch]              gh/SherlockNoMad/12/head    -> origin/gh/SherlockNoMad/12/head
2025-12-04T08:57:43.5808273Z  * [new branch]              gh/SherlockNoMad/12/orig    -> origin/gh/SherlockNoMad/12/orig
2025-12-04T08:57:43.5809875Z  * [new branch]              gh/SherlockNoMad/15/base    -> origin/gh/SherlockNoMad/15/base
2025-12-04T08:57:43.5811334Z  * [new branch]              gh/SherlockNoMad/15/head    -> origin/gh/SherlockNoMad/15/head
2025-12-04T08:57:43.5812296Z  * [new branch]              gh/SherlockNoMad/15/orig    -> origin/gh/SherlockNoMad/15/orig
2025-12-04T08:57:43.5813886Z  * [new branch]              gh/SherlockNoMad/17/base    -> origin/gh/SherlockNoMad/17/base
2025-12-04T08:57:43.5814916Z  * [new branch]              gh/SherlockNoMad/17/head    -> origin/gh/SherlockNoMad/17/head
2025-12-04T08:57:43.5816017Z  * [new branch]              gh/SherlockNoMad/17/orig    -> origin/gh/SherlockNoMad/17/orig
2025-12-04T08:57:43.5818082Z  * [new branch]              gh/SherlockNoMad/18/base    -> origin/gh/SherlockNoMad/18/base
2025-12-04T08:57:43.5819232Z  * [new branch]              gh/SherlockNoMad/18/head    -> origin/gh/SherlockNoMad/18/head
2025-12-04T08:57:43.5820403Z  * [new branch]              gh/SherlockNoMad/18/orig    -> origin/gh/SherlockNoMad/18/orig
2025-12-04T08:57:43.5822105Z  * [new branch]              gh/SherlockNoMad/19/base    -> origin/gh/SherlockNoMad/19/base
2025-12-04T08:57:43.5823261Z  * [new branch]              gh/SherlockNoMad/19/head    -> origin/gh/SherlockNoMad/19/head
2025-12-04T08:57:43.5824457Z  * [new branch]              gh/SherlockNoMad/19/orig    -> origin/gh/SherlockNoMad/19/orig
2025-12-04T08:57:43.5825901Z  * [new branch]              gh/SherlockNoMad/2/base     -> origin/gh/SherlockNoMad/2/base
2025-12-04T08:57:43.5826876Z  * [new branch]              gh/SherlockNoMad/2/head     -> origin/gh/SherlockNoMad/2/head
2025-12-04T08:57:43.5828295Z  * [new branch]              gh/SherlockNoMad/20/base    -> origin/gh/SherlockNoMad/20/base
2025-12-04T08:57:43.5829468Z  * [new branch]              gh/SherlockNoMad/20/head    -> origin/gh/SherlockNoMad/20/head
2025-12-04T08:57:43.5830487Z  * [new branch]              gh/SherlockNoMad/20/orig    -> origin/gh/SherlockNoMad/20/orig
2025-12-04T08:57:43.5832262Z  * [new branch]              gh/SherlockNoMad/21/base    -> origin/gh/SherlockNoMad/21/base
2025-12-04T08:57:43.5833471Z  * [new branch]              gh/SherlockNoMad/21/head    -> origin/gh/SherlockNoMad/21/head
2025-12-04T08:57:43.5834502Z  * [new branch]              gh/SherlockNoMad/21/orig    -> origin/gh/SherlockNoMad/21/orig
2025-12-04T08:57:43.5835946Z  * [new branch]              gh/SherlockNoMad/3/base     -> origin/gh/SherlockNoMad/3/base
2025-12-04T08:57:43.5836924Z  * [new branch]              gh/SherlockNoMad/3/head     -> origin/gh/SherlockNoMad/3/head
2025-12-04T08:57:43.5838301Z  * [new branch]              gh/SherlockNoMad/4/base     -> origin/gh/SherlockNoMad/4/base
2025-12-04T08:57:43.5839269Z  * [new branch]              gh/SherlockNoMad/4/head     -> origin/gh/SherlockNoMad/4/head
2025-12-04T08:57:43.5840662Z  * [new branch]              gh/SherlockNoMad/5/base     -> origin/gh/SherlockNoMad/5/base
2025-12-04T08:57:43.5841630Z  * [new branch]              gh/SherlockNoMad/5/head     -> origin/gh/SherlockNoMad/5/head
2025-12-04T08:57:43.5843952Z  * [new branch]              gh/Sidharth123-cpu/24/base  -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T08:57:43.5845346Z  * [new branch]              gh/Sidharth123-cpu/25/base  -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T08:57:43.5846639Z  * [new branch]              gh/Sidharth123-cpu/26/base  -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T08:57:43.5848302Z  * [new branch]              gh/Sidharth123-cpu/27/base  -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T08:57:43.5850019Z  * [new branch]              gh/StrongerXi/1/base        -> origin/gh/StrongerXi/1/base
2025-12-04T08:57:43.5851163Z  * [new branch]              gh/StrongerXi/1/head        -> origin/gh/StrongerXi/1/head
2025-12-04T08:57:43.5852596Z  * [new branch]              gh/StrongerXi/71/base       -> origin/gh/StrongerXi/71/base
2025-12-04T08:57:43.5853617Z  * [new branch]              gh/StrongerXi/71/head       -> origin/gh/StrongerXi/71/head
2025-12-04T08:57:43.5855012Z  * [new branch]              gh/StrongerXi/72/base       -> origin/gh/StrongerXi/72/base
2025-12-04T08:57:43.5856059Z  * [new branch]              gh/StrongerXi/72/head       -> origin/gh/StrongerXi/72/head
2025-12-04T08:57:43.5857921Z  * [new branch]              gh/StrongerXi/73/base       -> origin/gh/StrongerXi/73/base
2025-12-04T08:57:43.5858984Z  * [new branch]              gh/StrongerXi/73/head       -> origin/gh/StrongerXi/73/head
2025-12-04T08:57:43.5860168Z  * [new branch]              gh/StrongerXi/73/orig       -> origin/gh/StrongerXi/73/orig
2025-12-04T08:57:43.5862210Z  * [new branch]              gh/XilunWu/160/base         -> origin/gh/XilunWu/160/base
2025-12-04T08:57:43.5863252Z  * [new branch]              gh/XilunWu/160/head         -> origin/gh/XilunWu/160/head
2025-12-04T08:57:43.5864424Z  * [new branch]              gh/XilunWu/160/orig         -> origin/gh/XilunWu/160/orig
2025-12-04T08:57:43.5865999Z  * [new branch]              gh/XilunWu/163/base         -> origin/gh/XilunWu/163/base
2025-12-04T08:57:43.5867083Z  * [new branch]              gh/XilunWu/163/head         -> origin/gh/XilunWu/163/head
2025-12-04T08:57:43.5868203Z  * [new branch]              gh/XilunWu/163/orig         -> origin/gh/XilunWu/163/orig
2025-12-04T08:57:43.5869955Z  * [new branch]              gh/XilunWu/168/base         -> origin/gh/XilunWu/168/base
2025-12-04T08:57:43.5870983Z  * [new branch]              gh/XilunWu/168/head         -> origin/gh/XilunWu/168/head
2025-12-04T08:57:43.5872028Z  * [new branch]              gh/XilunWu/168/orig         -> origin/gh/XilunWu/168/orig
2025-12-04T08:57:43.5873561Z  * [new branch]              gh/XilunWu/169/base         -> origin/gh/XilunWu/169/base
2025-12-04T08:57:43.5874594Z  * [new branch]              gh/XilunWu/169/head         -> origin/gh/XilunWu/169/head
2025-12-04T08:57:43.5875680Z  * [new branch]              gh/XilunWu/169/orig         -> origin/gh/XilunWu/169/orig
2025-12-04T08:57:43.5877082Z  * [new branch]              gh/XilunWu/170/base         -> origin/gh/XilunWu/170/base
2025-12-04T08:57:43.5878281Z  * [new branch]              gh/XilunWu/170/head         -> origin/gh/XilunWu/170/head
2025-12-04T08:57:43.5879389Z  * [new branch]              gh/XilunWu/170/orig         -> origin/gh/XilunWu/170/orig
2025-12-04T08:57:43.5881025Z  * [new branch]              gh/XilunWu/171/base         -> origin/gh/XilunWu/171/base
2025-12-04T08:57:43.5882092Z  * [new branch]              gh/XilunWu/171/head         -> origin/gh/XilunWu/171/head
2025-12-04T08:57:43.5883213Z  * [new branch]              gh/XilunWu/171/orig         -> origin/gh/XilunWu/171/orig
2025-12-04T08:57:43.5884705Z  * [new branch]              gh/XilunWu/173/base         -> origin/gh/XilunWu/173/base
2025-12-04T08:57:43.5885796Z  * [new branch]              gh/XilunWu/173/head         -> origin/gh/XilunWu/173/head
2025-12-04T08:57:43.5886925Z  * [new branch]              gh/XilunWu/173/orig         -> origin/gh/XilunWu/173/orig
2025-12-04T08:57:43.5888450Z  * [new branch]              gh/XilunWu/175/base         -> origin/gh/XilunWu/175/base
2025-12-04T08:57:43.5889515Z  * [new branch]              gh/XilunWu/175/head         -> origin/gh/XilunWu/175/head
2025-12-04T08:57:43.5890619Z  * [new branch]              gh/XilunWu/175/orig         -> origin/gh/XilunWu/175/orig
2025-12-04T08:57:43.5892142Z  * [new branch]              gh/XilunWu/176/base         -> origin/gh/XilunWu/176/base
2025-12-04T08:57:43.5893222Z  * [new branch]              gh/XilunWu/176/head         -> origin/gh/XilunWu/176/head
2025-12-04T08:57:43.5894512Z  * [new branch]              gh/XilunWu/176/orig         -> origin/gh/XilunWu/176/orig
2025-12-04T08:57:43.5897158Z  * [new branch]              gh/XuehaiPan/14/base        -> origin/gh/XuehaiPan/14/base
2025-12-04T08:57:43.5898238Z  * [new branch]              gh/XuehaiPan/14/head        -> origin/gh/XuehaiPan/14/head
2025-12-04T08:57:43.5899359Z  * [new branch]              gh/XuehaiPan/14/orig        -> origin/gh/XuehaiPan/14/orig
2025-12-04T08:57:43.5900988Z  * [new branch]              gh/XuehaiPan/179/base       -> origin/gh/XuehaiPan/179/base
2025-12-04T08:57:43.5902077Z  * [new branch]              gh/XuehaiPan/179/head       -> origin/gh/XuehaiPan/179/head
2025-12-04T08:57:43.5903304Z  * [new branch]              gh/XuehaiPan/179/orig       -> origin/gh/XuehaiPan/179/orig
2025-12-04T08:57:43.5904842Z  * [new branch]              gh/XuehaiPan/249/base       -> origin/gh/XuehaiPan/249/base
2025-12-04T08:57:43.5905926Z  * [new branch]              gh/XuehaiPan/249/head       -> origin/gh/XuehaiPan/249/head
2025-12-04T08:57:43.5907192Z  * [new branch]              gh/XuehaiPan/249/orig       -> origin/gh/XuehaiPan/249/orig
2025-12-04T08:57:43.5908728Z  * [new branch]              gh/XuehaiPan/253/base       -> origin/gh/XuehaiPan/253/base
2025-12-04T08:57:43.5909904Z  * [new branch]              gh/XuehaiPan/253/head       -> origin/gh/XuehaiPan/253/head
2025-12-04T08:57:43.5911035Z  * [new branch]              gh/XuehaiPan/253/orig       -> origin/gh/XuehaiPan/253/orig
2025-12-04T08:57:43.5912600Z  * [new branch]              gh/XuehaiPan/254/base       -> origin/gh/XuehaiPan/254/base
2025-12-04T08:57:43.5913626Z  * [new branch]              gh/XuehaiPan/254/head       -> origin/gh/XuehaiPan/254/head
2025-12-04T08:57:43.5914744Z  * [new branch]              gh/XuehaiPan/254/orig       -> origin/gh/XuehaiPan/254/orig
2025-12-04T08:57:43.5916167Z  * [new branch]              gh/XuehaiPan/255/base       -> origin/gh/XuehaiPan/255/base
2025-12-04T08:57:43.5917176Z  * [new branch]              gh/XuehaiPan/255/head       -> origin/gh/XuehaiPan/255/head
2025-12-04T08:57:43.5918296Z  * [new branch]              gh/XuehaiPan/255/orig       -> origin/gh/XuehaiPan/255/orig
2025-12-04T08:57:43.5919784Z  * [new branch]              gh/XuehaiPan/271/base       -> origin/gh/XuehaiPan/271/base
2025-12-04T08:57:43.5920962Z  * [new branch]              gh/XuehaiPan/271/head       -> origin/gh/XuehaiPan/271/head
2025-12-04T08:57:43.5922395Z  * [new branch]              gh/XuehaiPan/271/orig       -> origin/gh/XuehaiPan/271/orig
2025-12-04T08:57:43.5923930Z  * [new branch]              gh/XuehaiPan/343/base       -> origin/gh/XuehaiPan/343/base
2025-12-04T08:57:43.5925013Z  * [new branch]              gh/XuehaiPan/343/head       -> origin/gh/XuehaiPan/343/head
2025-12-04T08:57:43.5926148Z  * [new branch]              gh/XuehaiPan/343/orig       -> origin/gh/XuehaiPan/343/orig
2025-12-04T08:57:43.5927759Z  * [new branch]              gh/XuehaiPan/347/base       -> origin/gh/XuehaiPan/347/base
2025-12-04T08:57:43.5928833Z  * [new branch]              gh/XuehaiPan/347/head       -> origin/gh/XuehaiPan/347/head
2025-12-04T08:57:43.5929999Z  * [new branch]              gh/XuehaiPan/347/orig       -> origin/gh/XuehaiPan/347/orig
2025-12-04T08:57:43.5931555Z  * [new branch]              gh/XuehaiPan/348/base       -> origin/gh/XuehaiPan/348/base
2025-12-04T08:57:43.5932590Z  * [new branch]              gh/XuehaiPan/348/head       -> origin/gh/XuehaiPan/348/head
2025-12-04T08:57:43.5933806Z  * [new branch]              gh/XuehaiPan/348/orig       -> origin/gh/XuehaiPan/348/orig
2025-12-04T08:57:43.5935284Z  * [new branch]              gh/XuehaiPan/350/base       -> origin/gh/XuehaiPan/350/base
2025-12-04T08:57:43.5936406Z  * [new branch]              gh/XuehaiPan/350/head       -> origin/gh/XuehaiPan/350/head
2025-12-04T08:57:43.5937813Z  * [new branch]              gh/XuehaiPan/350/orig       -> origin/gh/XuehaiPan/350/orig
2025-12-04T08:57:43.5939353Z  * [new branch]              gh/XuehaiPan/365/base       -> origin/gh/XuehaiPan/365/base
2025-12-04T08:57:43.5940462Z  * [new branch]              gh/XuehaiPan/365/head       -> origin/gh/XuehaiPan/365/head
2025-12-04T08:57:43.5941752Z  * [new branch]              gh/XuehaiPan/365/orig       -> origin/gh/XuehaiPan/365/orig
2025-12-04T08:57:43.5943229Z  * [new branch]              gh/XuehaiPan/366/base       -> origin/gh/XuehaiPan/366/base
2025-12-04T08:57:43.5944446Z  * [new branch]              gh/XuehaiPan/366/head       -> origin/gh/XuehaiPan/366/head
2025-12-04T08:57:43.5945990Z  * [new branch]              gh/XuehaiPan/370/base       -> origin/gh/XuehaiPan/370/base
2025-12-04T08:57:43.5947046Z  * [new branch]              gh/XuehaiPan/370/head       -> origin/gh/XuehaiPan/370/head
2025-12-04T08:57:43.5948194Z  * [new branch]              gh/XuehaiPan/370/orig       -> origin/gh/XuehaiPan/370/orig
2025-12-04T08:57:43.5949876Z  * [new branch]              gh/XuehaiPan/390/base       -> origin/gh/XuehaiPan/390/base
2025-12-04T08:57:43.5950890Z  * [new branch]              gh/XuehaiPan/390/head       -> origin/gh/XuehaiPan/390/head
2025-12-04T08:57:43.5951970Z  * [new branch]              gh/XuehaiPan/390/orig       -> origin/gh/XuehaiPan/390/orig
2025-12-04T08:57:43.5953489Z  * [new branch]              gh/XuehaiPan/391/base       -> origin/gh/XuehaiPan/391/base
2025-12-04T08:57:43.5954553Z  * [new branch]              gh/XuehaiPan/391/head       -> origin/gh/XuehaiPan/391/head
2025-12-04T08:57:43.5955608Z  * [new branch]              gh/XuehaiPan/391/orig       -> origin/gh/XuehaiPan/391/orig
2025-12-04T08:57:43.5957177Z  * [new branch]              gh/XuehaiPan/392/base       -> origin/gh/XuehaiPan/392/base
2025-12-04T08:57:43.5958245Z  * [new branch]              gh/XuehaiPan/392/head       -> origin/gh/XuehaiPan/392/head
2025-12-04T08:57:43.5959332Z  * [new branch]              gh/XuehaiPan/392/orig       -> origin/gh/XuehaiPan/392/orig
2025-12-04T08:57:43.5961281Z  * [new branch]              gh/XuehaiPan/394/base       -> origin/gh/XuehaiPan/394/base
2025-12-04T08:57:43.5962317Z  * [new branch]              gh/XuehaiPan/394/head       -> origin/gh/XuehaiPan/394/head
2025-12-04T08:57:43.5963401Z  * [new branch]              gh/XuehaiPan/394/orig       -> origin/gh/XuehaiPan/394/orig
2025-12-04T08:57:43.5964951Z  * [new branch]              gh/XuehaiPan/397/base       -> origin/gh/XuehaiPan/397/base
2025-12-04T08:57:43.5966027Z  * [new branch]              gh/XuehaiPan/397/head       -> origin/gh/XuehaiPan/397/head
2025-12-04T08:57:43.5967145Z  * [new branch]              gh/XuehaiPan/397/orig       -> origin/gh/XuehaiPan/397/orig
2025-12-04T08:57:43.5968729Z  * [new branch]              gh/XuehaiPan/398/base       -> origin/gh/XuehaiPan/398/base
2025-12-04T08:57:43.5969804Z  * [new branch]              gh/XuehaiPan/398/head       -> origin/gh/XuehaiPan/398/head
2025-12-04T08:57:43.5970949Z  * [new branch]              gh/XuehaiPan/398/orig       -> origin/gh/XuehaiPan/398/orig
2025-12-04T08:57:43.5972417Z  * [new branch]              gh/XuehaiPan/399/base       -> origin/gh/XuehaiPan/399/base
2025-12-04T08:57:43.5973479Z  * [new branch]              gh/XuehaiPan/399/head       -> origin/gh/XuehaiPan/399/head
2025-12-04T08:57:43.5974587Z  * [new branch]              gh/XuehaiPan/399/orig       -> origin/gh/XuehaiPan/399/orig
2025-12-04T08:57:43.5976187Z  * [new branch]              gh/XuehaiPan/400/base       -> origin/gh/XuehaiPan/400/base
2025-12-04T08:57:43.5977651Z  * [new branch]              gh/XuehaiPan/400/head       -> origin/gh/XuehaiPan/400/head
2025-12-04T08:57:43.5978786Z  * [new branch]              gh/XuehaiPan/400/orig       -> origin/gh/XuehaiPan/400/orig
2025-12-04T08:57:43.5980647Z  * [new branch]              gh/ZhiweiYan-96/39/base     -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T08:57:43.5981748Z  * [new branch]              gh/ZhiweiYan-96/39/head     -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T08:57:43.5982952Z  * [new branch]              gh/ZhiweiYan-96/39/orig     -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T08:57:43.5984457Z  * [new branch]              gh/ZhiweiYan-96/44/base     -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T08:57:43.5985616Z  * [new branch]              gh/ZhiweiYan-96/44/head     -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T08:57:43.5987036Z  * [new branch]              gh/ZhiweiYan-96/45/base     -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T08:57:43.5988010Z  * [new branch]              gh/ZhiweiYan-96/45/head     -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T08:57:43.5989736Z  * [new branch]              gh/ZhiweiYan-96/49/base     -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T08:57:43.5990794Z  * [new branch]              gh/ZhiweiYan-96/49/head     -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T08:57:43.5992343Z  * [new branch]              gh/ZhiweiYan-96/62/base     -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T08:57:43.5993376Z  * [new branch]              gh/ZhiweiYan-96/62/head     -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T08:57:43.5995144Z  * [new branch]              gh/ZhiweiYan-96/66/base     -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T08:57:43.5995999Z  * [new branch]              gh/ZhiweiYan-96/66/head     -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T08:57:43.5997414Z  * [new branch]              gh/ZhiweiYan-96/67/base     -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T08:57:43.5998428Z  * [new branch]              gh/ZhiweiYan-96/67/head     -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T08:57:43.5999827Z  * [new branch]              gh/ZhiweiYan-96/68/base     -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T08:57:43.6000792Z  * [new branch]              gh/ZhiweiYan-96/68/head     -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T08:57:43.6001862Z  * [new branch]              gh/ZhiweiYan-96/68/orig     -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T08:57:43.6003687Z  * [new branch]              gh/aakhundov/1/base         -> origin/gh/aakhundov/1/base
2025-12-04T08:57:43.6004799Z  * [new branch]              gh/aakhundov/1/head         -> origin/gh/aakhundov/1/head
2025-12-04T08:57:43.6006194Z  * [new branch]              gh/aakhundov/2/base         -> origin/gh/aakhundov/2/base
2025-12-04T08:57:43.6007304Z  * [new branch]              gh/aakhundov/2/head         -> origin/gh/aakhundov/2/head
2025-12-04T08:57:43.6008858Z  * [new branch]              gh/aditew01/openblas        -> origin/gh/aditew01/openblas
2025-12-04T08:57:43.6010273Z  * [new branch]              gh/aditew01/sbgemm          -> origin/gh/aditew01/sbgemm
2025-12-04T08:57:43.6011325Z  * [new branch]              gh/aditew01/vecbf16         -> origin/gh/aditew01/vecbf16
2025-12-04T08:57:43.6013082Z  * [new branch]              gh/albanD/4/base            -> origin/gh/albanD/4/base
2025-12-04T08:57:43.6014098Z  * [new branch]              gh/albanD/4/head            -> origin/gh/albanD/4/head
2025-12-04T08:57:43.6015184Z  * [new branch]              gh/albanD/4/orig            -> origin/gh/albanD/4/orig
2025-12-04T08:57:43.6017426Z  * [new branch]              gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T08:57:43.6018957Z  * [new branch]              gh/alexsamardzic/12/base    -> origin/gh/alexsamardzic/12/base
2025-12-04T08:57:43.6020101Z  * [new branch]              gh/alexsamardzic/12/head    -> origin/gh/alexsamardzic/12/head
2025-12-04T08:57:43.6021423Z  * [new branch]              gh/alexsamardzic/12/orig    -> origin/gh/alexsamardzic/12/orig
2025-12-04T08:57:43.6023107Z  * [new branch]              gh/alexsamardzic/14/base    -> origin/gh/alexsamardzic/14/base
2025-12-04T08:57:43.6024184Z  * [new branch]              gh/alexsamardzic/14/head    -> origin/gh/alexsamardzic/14/head
2025-12-04T08:57:43.6025343Z  * [new branch]              gh/alexsamardzic/14/orig    -> origin/gh/alexsamardzic/14/orig
2025-12-04T08:57:43.6026948Z  * [new branch]              gh/alexsamardzic/15/base    -> origin/gh/alexsamardzic/15/base
2025-12-04T08:57:43.6028029Z  * [new branch]              gh/alexsamardzic/15/head    -> origin/gh/alexsamardzic/15/head
2025-12-04T08:57:43.6029149Z  * [new branch]              gh/alexsamardzic/15/orig    -> origin/gh/alexsamardzic/15/orig
2025-12-04T08:57:43.6031119Z  * [new branch]              gh/amjames/18/base          -> origin/gh/amjames/18/base
2025-12-04T08:57:43.6032056Z  * [new branch]              gh/amjames/18/head          -> origin/gh/amjames/18/head
2025-12-04T08:57:43.6033278Z  * [new branch]              gh/amjames/18/orig          -> origin/gh/amjames/18/orig
2025-12-04T08:57:43.6035354Z  * [new branch]              gh/andrewor14/35/base       -> origin/gh/andrewor14/35/base
2025-12-04T08:57:43.6036454Z  * [new branch]              gh/andrewor14/35/head       -> origin/gh/andrewor14/35/head
2025-12-04T08:57:43.6037597Z  * [new branch]              gh/andrewor14/35/orig       -> origin/gh/andrewor14/35/orig
2025-12-04T08:57:43.6039262Z  * [new branch]              gh/andrewor14/50/base       -> origin/gh/andrewor14/50/base
2025-12-04T08:57:43.6040471Z  * [new branch]              gh/andrewor14/50/head       -> origin/gh/andrewor14/50/head
2025-12-04T08:57:43.6041625Z  * [new branch]              gh/andrewor14/50/orig       -> origin/gh/andrewor14/50/orig
2025-12-04T08:57:43.6043514Z  * [new branch]              gh/andyanwang/30/base       -> origin/gh/andyanwang/30/base
2025-12-04T08:57:43.6044778Z  * [new branch]              gh/andyanwang/30/orig       -> origin/gh/andyanwang/30/orig
2025-12-04T08:57:43.6046382Z  * [new branch]              gh/andyanwang/31/base       -> origin/gh/andyanwang/31/base
2025-12-04T08:57:43.6047616Z  * [new branch]              gh/andyanwang/31/orig       -> origin/gh/andyanwang/31/orig
2025-12-04T08:57:43.6049148Z  * [new branch]              gh/andyanwang/39/base       -> origin/gh/andyanwang/39/base
2025-12-04T08:57:43.6050265Z  * [new branch]              gh/andyanwang/39/head       -> origin/gh/andyanwang/39/head
2025-12-04T08:57:43.6051438Z  * [new branch]              gh/andyanwang/39/orig       -> origin/gh/andyanwang/39/orig
2025-12-04T08:57:43.6053175Z  * [new branch]              gh/andyanwang/42/base       -> origin/gh/andyanwang/42/base
2025-12-04T08:57:43.6054147Z  * [new branch]              gh/andyanwang/42/head       -> origin/gh/andyanwang/42/head
2025-12-04T08:57:43.6055249Z  * [new branch]              gh/andyanwang/42/orig       -> origin/gh/andyanwang/42/orig
2025-12-04T08:57:43.6057217Z  * [new branch]              gh/andyanwang/45/base       -> origin/gh/andyanwang/45/base
2025-12-04T08:57:43.6058403Z  * [new branch]              gh/andyanwang/45/head       -> origin/gh/andyanwang/45/head
2025-12-04T08:57:43.6059572Z  * [new branch]              gh/andyanwang/45/orig       -> origin/gh/andyanwang/45/orig
2025-12-04T08:57:43.6061402Z  * [new branch]              gh/angelayi/107/base        -> origin/gh/angelayi/107/base
2025-12-04T08:57:43.6062462Z  * [new branch]              gh/angelayi/107/head        -> origin/gh/angelayi/107/head
2025-12-04T08:57:43.6064065Z  * [new branch]              gh/angelayi/114/base        -> origin/gh/angelayi/114/base
2025-12-04T08:57:43.6065232Z  * [new branch]              gh/angelayi/114/head        -> origin/gh/angelayi/114/head
2025-12-04T08:57:43.6066391Z  * [new branch]              gh/angelayi/114/orig        -> origin/gh/angelayi/114/orig
2025-12-04T08:57:43.6067965Z  * [new branch]              gh/angelayi/116/base        -> origin/gh/angelayi/116/base
2025-12-04T08:57:43.6069133Z  * [new branch]              gh/angelayi/116/head        -> origin/gh/angelayi/116/head
2025-12-04T08:57:43.6070235Z  * [new branch]              gh/angelayi/116/orig        -> origin/gh/angelayi/116/orig
2025-12-04T08:57:43.6071865Z  * [new branch]              gh/angelayi/122/base        -> origin/gh/angelayi/122/base
2025-12-04T08:57:43.6072827Z  * [new branch]              gh/angelayi/122/head        -> origin/gh/angelayi/122/head
2025-12-04T08:57:43.6073943Z  * [new branch]              gh/angelayi/122/orig        -> origin/gh/angelayi/122/orig
2025-12-04T08:57:43.6075624Z  * [new branch]              gh/angelayi/124/base        -> origin/gh/angelayi/124/base
2025-12-04T08:57:43.6076622Z  * [new branch]              gh/angelayi/124/head        -> origin/gh/angelayi/124/head
2025-12-04T08:57:43.6077831Z  * [new branch]              gh/angelayi/124/orig        -> origin/gh/angelayi/124/orig
2025-12-04T08:57:43.6079517Z  * [new branch]              gh/angelayi/128/base        -> origin/gh/angelayi/128/base
2025-12-04T08:57:43.6080650Z  * [new branch]              gh/angelayi/128/head        -> origin/gh/angelayi/128/head
2025-12-04T08:57:43.6081739Z  * [new branch]              gh/angelayi/128/orig        -> origin/gh/angelayi/128/orig
2025-12-04T08:57:43.6083268Z  * [new branch]              gh/angelayi/131/base        -> origin/gh/angelayi/131/base
2025-12-04T08:57:43.6084320Z  * [new branch]              gh/angelayi/131/head        -> origin/gh/angelayi/131/head
2025-12-04T08:57:43.6085444Z  * [new branch]              gh/angelayi/131/orig        -> origin/gh/angelayi/131/orig
2025-12-04T08:57:43.6087178Z  * [new branch]              gh/angelayi/132/base        -> origin/gh/angelayi/132/base
2025-12-04T08:57:43.6088545Z  * [new branch]              gh/angelayi/132/head        -> origin/gh/angelayi/132/head
2025-12-04T08:57:43.6089729Z  * [new branch]              gh/angelayi/132/orig        -> origin/gh/angelayi/132/orig
2025-12-04T08:57:43.6091214Z  * [new branch]              gh/angelayi/133/base        -> origin/gh/angelayi/133/base
2025-12-04T08:57:43.6092279Z  * [new branch]              gh/angelayi/133/head        -> origin/gh/angelayi/133/head
2025-12-04T08:57:43.6093372Z  * [new branch]              gh/angelayi/133/orig        -> origin/gh/angelayi/133/orig
2025-12-04T08:57:43.6095170Z  * [new branch]              gh/angelayi/134/base        -> origin/gh/angelayi/134/base
2025-12-04T08:57:43.6096587Z  * [new branch]              gh/angelayi/134/head        -> origin/gh/angelayi/134/head
2025-12-04T08:57:43.6097977Z  * [new branch]              gh/angelayi/134/orig        -> origin/gh/angelayi/134/orig
2025-12-04T08:57:43.6099802Z  * [new branch]              gh/angelayi/135/base        -> origin/gh/angelayi/135/base
2025-12-04T08:57:43.6100988Z  * [new branch]              gh/angelayi/135/head        -> origin/gh/angelayi/135/head
2025-12-04T08:57:43.6102138Z  * [new branch]              gh/angelayi/135/orig        -> origin/gh/angelayi/135/orig
2025-12-04T08:57:43.6103699Z  * [new branch]              gh/angelayi/136/base        -> origin/gh/angelayi/136/base
2025-12-04T08:57:43.6104784Z  * [new branch]              gh/angelayi/136/head        -> origin/gh/angelayi/136/head
2025-12-04T08:57:43.6105922Z  * [new branch]              gh/angelayi/136/orig        -> origin/gh/angelayi/136/orig
2025-12-04T08:57:43.6107482Z  * [new branch]              gh/angelayi/137/base        -> origin/gh/angelayi/137/base
2025-12-04T08:57:43.6108516Z  * [new branch]              gh/angelayi/137/head        -> origin/gh/angelayi/137/head
2025-12-04T08:57:43.6110086Z  * [new branch]              gh/angelayi/137/orig        -> origin/gh/angelayi/137/orig
2025-12-04T08:57:43.6111484Z  * [new branch]              gh/angelayi/138/base        -> origin/gh/angelayi/138/base
2025-12-04T08:57:43.6112465Z  * [new branch]              gh/angelayi/138/head        -> origin/gh/angelayi/138/head
2025-12-04T08:57:43.6113539Z  * [new branch]              gh/angelayi/138/orig        -> origin/gh/angelayi/138/orig
2025-12-04T08:57:43.6115073Z  * [new branch]              gh/angelayi/139/base        -> origin/gh/angelayi/139/base
2025-12-04T08:57:43.6116110Z  * [new branch]              gh/angelayi/139/head        -> origin/gh/angelayi/139/head
2025-12-04T08:57:43.6117197Z  * [new branch]              gh/angelayi/139/orig        -> origin/gh/angelayi/139/orig
2025-12-04T08:57:43.6118811Z  * [new branch]              gh/angelayi/140/base        -> origin/gh/angelayi/140/base
2025-12-04T08:57:43.6119933Z  * [new branch]              gh/angelayi/140/head        -> origin/gh/angelayi/140/head
2025-12-04T08:57:43.6121362Z  * [new branch]              gh/angelayi/140/orig        -> origin/gh/angelayi/140/orig
2025-12-04T08:57:43.6126621Z  * [new branch]              gh/angelayi/141/base        -> origin/gh/angelayi/141/base
2025-12-04T08:57:43.6127919Z  * [new branch]              gh/angelayi/141/head        -> origin/gh/angelayi/141/head
2025-12-04T08:57:43.6128945Z  * [new branch]              gh/angelayi/141/orig        -> origin/gh/angelayi/141/orig
2025-12-04T08:57:43.6130589Z  * [new branch]              gh/angelayi/142/base        -> origin/gh/angelayi/142/base
2025-12-04T08:57:43.6131649Z  * [new branch]              gh/angelayi/142/head        -> origin/gh/angelayi/142/head
2025-12-04T08:57:43.6132777Z  * [new branch]              gh/angelayi/142/orig        -> origin/gh/angelayi/142/orig
2025-12-04T08:57:43.6134512Z  * [new branch]              gh/angelayi/143/base        -> origin/gh/angelayi/143/base
2025-12-04T08:57:43.6135552Z  * [new branch]              gh/angelayi/143/head        -> origin/gh/angelayi/143/head
2025-12-04T08:57:43.6136881Z  * [new branch]              gh/angelayi/143/orig        -> origin/gh/angelayi/143/orig
2025-12-04T08:57:43.6138618Z  * [new branch]              gh/angelayi/144/base        -> origin/gh/angelayi/144/base
2025-12-04T08:57:43.6139808Z  * [new branch]              gh/angelayi/144/head        -> origin/gh/angelayi/144/head
2025-12-04T08:57:43.6140995Z  * [new branch]              gh/angelayi/144/orig        -> origin/gh/angelayi/144/orig
2025-12-04T08:57:43.6143046Z  * [new branch]              gh/anijain2305/753/base     -> origin/gh/anijain2305/753/base
2025-12-04T08:57:43.6144127Z  * [new branch]              gh/anijain2305/753/head     -> origin/gh/anijain2305/753/head
2025-12-04T08:57:43.6145265Z  * [new branch]              gh/anijain2305/753/orig     -> origin/gh/anijain2305/753/orig
2025-12-04T08:57:43.6146977Z  * [new branch]              gh/anijain2305/810/base     -> origin/gh/anijain2305/810/base
2025-12-04T08:57:43.6148048Z  * [new branch]              gh/anijain2305/810/head     -> origin/gh/anijain2305/810/head
2025-12-04T08:57:43.6149384Z  * [new branch]              gh/anijain2305/810/orig     -> origin/gh/anijain2305/810/orig
2025-12-04T08:57:43.6150982Z  * [new branch]              gh/anijain2305/854/base     -> origin/gh/anijain2305/854/base
2025-12-04T08:57:43.6152109Z  * [new branch]              gh/anijain2305/854/head     -> origin/gh/anijain2305/854/head
2025-12-04T08:57:43.6153208Z  * [new branch]              gh/anijain2305/854/orig     -> origin/gh/anijain2305/854/orig
2025-12-04T08:57:43.6154847Z  * [new branch]              gh/anijain2305/864/base     -> origin/gh/anijain2305/864/base
2025-12-04T08:57:43.6155889Z  * [new branch]              gh/anijain2305/864/head     -> origin/gh/anijain2305/864/head
2025-12-04T08:57:43.6157001Z  * [new branch]              gh/anijain2305/864/orig     -> origin/gh/anijain2305/864/orig
2025-12-04T08:57:43.6158589Z  * [new branch]              gh/anijain2305/870/base     -> origin/gh/anijain2305/870/base
2025-12-04T08:57:43.6159566Z  * [new branch]              gh/anijain2305/870/head     -> origin/gh/anijain2305/870/head
2025-12-04T08:57:43.6160625Z  * [new branch]              gh/anijain2305/870/orig     -> origin/gh/anijain2305/870/orig
2025-12-04T08:57:43.6162306Z  * [new branch]              gh/anijain2305/873/base     -> origin/gh/anijain2305/873/base
2025-12-04T08:57:43.6163262Z  * [new branch]              gh/anijain2305/873/head     -> origin/gh/anijain2305/873/head
2025-12-04T08:57:43.6164320Z  * [new branch]              gh/anijain2305/873/orig     -> origin/gh/anijain2305/873/orig
2025-12-04T08:57:43.6165830Z  * [new branch]              gh/anijain2305/894/base     -> origin/gh/anijain2305/894/base
2025-12-04T08:57:43.6166859Z  * [new branch]              gh/anijain2305/894/head     -> origin/gh/anijain2305/894/head
2025-12-04T08:57:43.6167951Z  * [new branch]              gh/anijain2305/894/orig     -> origin/gh/anijain2305/894/orig
2025-12-04T08:57:43.6169506Z  * [new branch]              gh/anijain2305/895/base     -> origin/gh/anijain2305/895/base
2025-12-04T08:57:43.6170596Z  * [new branch]              gh/anijain2305/895/head     -> origin/gh/anijain2305/895/head
2025-12-04T08:57:43.6171695Z  * [new branch]              gh/anijain2305/895/orig     -> origin/gh/anijain2305/895/orig
2025-12-04T08:57:43.6173352Z  * [new branch]              gh/anijain2305/910/base     -> origin/gh/anijain2305/910/base
2025-12-04T08:57:43.6174358Z  * [new branch]              gh/anijain2305/910/head     -> origin/gh/anijain2305/910/head
2025-12-04T08:57:43.6175468Z  * [new branch]              gh/anijain2305/910/orig     -> origin/gh/anijain2305/910/orig
2025-12-04T08:57:43.6177516Z  * [new branch]              gh/anijain2305/919/base     -> origin/gh/anijain2305/919/base
2025-12-04T08:57:43.6178636Z  * [new branch]              gh/anijain2305/919/head     -> origin/gh/anijain2305/919/head
2025-12-04T08:57:43.6179772Z  * [new branch]              gh/anijain2305/919/orig     -> origin/gh/anijain2305/919/orig
2025-12-04T08:57:43.6181361Z  * [new branch]              gh/anijain2305/922/base     -> origin/gh/anijain2305/922/base
2025-12-04T08:57:43.6182474Z  * [new branch]              gh/anijain2305/922/head     -> origin/gh/anijain2305/922/head
2025-12-04T08:57:43.6183623Z  * [new branch]              gh/anijain2305/922/orig     -> origin/gh/anijain2305/922/orig
2025-12-04T08:57:43.6185215Z  * [new branch]              gh/anijain2305/932/base     -> origin/gh/anijain2305/932/base
2025-12-04T08:57:43.6186479Z  * [new branch]              gh/anijain2305/932/head     -> origin/gh/anijain2305/932/head
2025-12-04T08:57:43.6187724Z  * [new branch]              gh/anijain2305/932/orig     -> origin/gh/anijain2305/932/orig
2025-12-04T08:57:43.6189382Z  * [new branch]              gh/anijain2305/940/base     -> origin/gh/anijain2305/940/base
2025-12-04T08:57:43.6190418Z  * [new branch]              gh/anijain2305/940/head     -> origin/gh/anijain2305/940/head
2025-12-04T08:57:43.6191517Z  * [new branch]              gh/anijain2305/940/orig     -> origin/gh/anijain2305/940/orig
2025-12-04T08:57:43.6193070Z  * [new branch]              gh/anijain2305/941/base     -> origin/gh/anijain2305/941/base
2025-12-04T08:57:43.6194103Z  * [new branch]              gh/anijain2305/941/head     -> origin/gh/anijain2305/941/head
2025-12-04T08:57:43.6195254Z  * [new branch]              gh/anijain2305/941/orig     -> origin/gh/anijain2305/941/orig
2025-12-04T08:57:43.6196731Z  * [new branch]              gh/anijain2305/942/base     -> origin/gh/anijain2305/942/base
2025-12-04T08:57:43.6197865Z  * [new branch]              gh/anijain2305/942/head     -> origin/gh/anijain2305/942/head
2025-12-04T08:57:43.6199036Z  * [new branch]              gh/anijain2305/942/orig     -> origin/gh/anijain2305/942/orig
2025-12-04T08:57:43.6200607Z  * [new branch]              gh/anijain2305/943/base     -> origin/gh/anijain2305/943/base
2025-12-04T08:57:43.6201641Z  * [new branch]              gh/anijain2305/943/head     -> origin/gh/anijain2305/943/head
2025-12-04T08:57:43.6202752Z  * [new branch]              gh/anijain2305/943/orig     -> origin/gh/anijain2305/943/orig
2025-12-04T08:57:43.6204826Z  * [new branch]              gh/anijain2305/944/base     -> origin/gh/anijain2305/944/base
2025-12-04T08:57:43.6205857Z  * [new branch]              gh/anijain2305/944/head     -> origin/gh/anijain2305/944/head
2025-12-04T08:57:43.6206932Z  * [new branch]              gh/anijain2305/944/orig     -> origin/gh/anijain2305/944/orig
2025-12-04T08:57:43.6209227Z  * [new branch]              gh/anijain2305/945/base     -> origin/gh/anijain2305/945/base
2025-12-04T08:57:43.6210382Z  * [new branch]              gh/anijain2305/945/head     -> origin/gh/anijain2305/945/head
2025-12-04T08:57:43.6211475Z  * [new branch]              gh/anijain2305/945/orig     -> origin/gh/anijain2305/945/orig
2025-12-04T08:57:43.6213079Z  * [new branch]              gh/anijain2305/946/base     -> origin/gh/anijain2305/946/base
2025-12-04T08:57:43.6214131Z  * [new branch]              gh/anijain2305/946/head     -> origin/gh/anijain2305/946/head
2025-12-04T08:57:43.6215367Z  * [new branch]              gh/anijain2305/946/orig     -> origin/gh/anijain2305/946/orig
2025-12-04T08:57:43.6217226Z  * [new branch]              gh/anijain2305/947/base     -> origin/gh/anijain2305/947/base
2025-12-04T08:57:43.6218408Z  * [new branch]              gh/anijain2305/947/head     -> origin/gh/anijain2305/947/head
2025-12-04T08:57:43.6219465Z  * [new branch]              gh/anijain2305/947/orig     -> origin/gh/anijain2305/947/orig
2025-12-04T08:57:43.6221373Z  * [new branch]              gh/anijain2305/948/base     -> origin/gh/anijain2305/948/base
2025-12-04T08:57:43.6222467Z  * [new branch]              gh/anijain2305/948/head     -> origin/gh/anijain2305/948/head
2025-12-04T08:57:43.6223589Z  * [new branch]              gh/anijain2305/948/orig     -> origin/gh/anijain2305/948/orig
2025-12-04T08:57:43.6225195Z  * [new branch]              gh/anijain2305/949/base     -> origin/gh/anijain2305/949/base
2025-12-04T08:57:43.6226285Z  * [new branch]              gh/anijain2305/949/head     -> origin/gh/anijain2305/949/head
2025-12-04T08:57:43.6227470Z  * [new branch]              gh/anijain2305/949/orig     -> origin/gh/anijain2305/949/orig
2025-12-04T08:57:43.6229106Z  * [new branch]              gh/anijain2305/950/base     -> origin/gh/anijain2305/950/base
2025-12-04T08:57:43.6230231Z  * [new branch]              gh/anijain2305/950/head     -> origin/gh/anijain2305/950/head
2025-12-04T08:57:43.6231340Z  * [new branch]              gh/anijain2305/950/orig     -> origin/gh/anijain2305/950/orig
2025-12-04T08:57:43.6233134Z  * [new branch]              gh/anijain2305/951/base     -> origin/gh/anijain2305/951/base
2025-12-04T08:57:43.6234165Z  * [new branch]              gh/anijain2305/951/head     -> origin/gh/anijain2305/951/head
2025-12-04T08:57:43.6235247Z  * [new branch]              gh/anijain2305/951/orig     -> origin/gh/anijain2305/951/orig
2025-12-04T08:57:43.6236846Z  * [new branch]              gh/anijain2305/952/base     -> origin/gh/anijain2305/952/base
2025-12-04T08:57:43.6237877Z  * [new branch]              gh/anijain2305/952/head     -> origin/gh/anijain2305/952/head
2025-12-04T08:57:43.6238962Z  * [new branch]              gh/anijain2305/952/orig     -> origin/gh/anijain2305/952/orig
2025-12-04T08:57:43.6240498Z  * [new branch]              gh/anijain2305/953/base     -> origin/gh/anijain2305/953/base
2025-12-04T08:57:43.6241504Z  * [new branch]              gh/anijain2305/953/head     -> origin/gh/anijain2305/953/head
2025-12-04T08:57:43.6242611Z  * [new branch]              gh/anijain2305/953/orig     -> origin/gh/anijain2305/953/orig
2025-12-04T08:57:43.6244190Z  * [new branch]              gh/anijain2305/954/base     -> origin/gh/anijain2305/954/base
2025-12-04T08:57:43.6245305Z  * [new branch]              gh/anijain2305/954/head     -> origin/gh/anijain2305/954/head
2025-12-04T08:57:43.6246950Z  * [new branch]              gh/anijain2305/954/orig     -> origin/gh/anijain2305/954/orig
2025-12-04T08:57:43.6248740Z  * [new branch]              gh/anijain2305/955/base     -> origin/gh/anijain2305/955/base
2025-12-04T08:57:43.6249641Z  * [new branch]              gh/anijain2305/955/head     -> origin/gh/anijain2305/955/head
2025-12-04T08:57:43.6250737Z  * [new branch]              gh/anijain2305/955/orig     -> origin/gh/anijain2305/955/orig
2025-12-04T08:57:43.6252404Z  * [new branch]              gh/anijain2305/956/base     -> origin/gh/anijain2305/956/base
2025-12-04T08:57:43.6253459Z  * [new branch]              gh/anijain2305/956/head     -> origin/gh/anijain2305/956/head
2025-12-04T08:57:43.6254589Z  * [new branch]              gh/anijain2305/956/orig     -> origin/gh/anijain2305/956/orig
2025-12-04T08:57:43.6256256Z  * [new branch]              gh/anijain2305/957/base     -> origin/gh/anijain2305/957/base
2025-12-04T08:57:43.6257677Z  * [new branch]              gh/anijain2305/957/head     -> origin/gh/anijain2305/957/head
2025-12-04T08:57:43.6258824Z  * [new branch]              gh/anijain2305/957/orig     -> origin/gh/anijain2305/957/orig
2025-12-04T08:57:43.6260482Z  * [new branch]              gh/anijain2305/958/base     -> origin/gh/anijain2305/958/base
2025-12-04T08:57:43.6261610Z  * [new branch]              gh/anijain2305/958/head     -> origin/gh/anijain2305/958/head
2025-12-04T08:57:43.6262833Z  * [new branch]              gh/anijain2305/958/orig     -> origin/gh/anijain2305/958/orig
2025-12-04T08:57:43.6264315Z  * [new branch]              gh/anijain2305/959/base     -> origin/gh/anijain2305/959/base
2025-12-04T08:57:43.6265388Z  * [new branch]              gh/anijain2305/959/head     -> origin/gh/anijain2305/959/head
2025-12-04T08:57:43.6266530Z  * [new branch]              gh/anijain2305/959/orig     -> origin/gh/anijain2305/959/orig
2025-12-04T08:57:43.6268270Z  * [new branch]              gh/anijain2305/960/base     -> origin/gh/anijain2305/960/base
2025-12-04T08:57:43.6269477Z  * [new branch]              gh/anijain2305/960/head     -> origin/gh/anijain2305/960/head
2025-12-04T08:57:43.6270601Z  * [new branch]              gh/anijain2305/960/orig     -> origin/gh/anijain2305/960/orig
2025-12-04T08:57:43.6272269Z  * [new branch]              gh/anijain2305/961/base     -> origin/gh/anijain2305/961/base
2025-12-04T08:57:43.6273302Z  * [new branch]              gh/anijain2305/961/head     -> origin/gh/anijain2305/961/head
2025-12-04T08:57:43.6274389Z  * [new branch]              gh/anijain2305/961/orig     -> origin/gh/anijain2305/961/orig
2025-12-04T08:57:43.6276094Z  * [new branch]              gh/anijain2305/962/base     -> origin/gh/anijain2305/962/base
2025-12-04T08:57:43.6277107Z  * [new branch]              gh/anijain2305/962/head     -> origin/gh/anijain2305/962/head
2025-12-04T08:57:43.6278203Z  * [new branch]              gh/anijain2305/962/orig     -> origin/gh/anijain2305/962/orig
2025-12-04T08:57:43.6280151Z  * [new branch]              gh/anijain2305/963/base     -> origin/gh/anijain2305/963/base
2025-12-04T08:57:43.6281370Z  * [new branch]              gh/anijain2305/963/head     -> origin/gh/anijain2305/963/head
2025-12-04T08:57:43.6282726Z  * [new branch]              gh/anijain2305/963/orig     -> origin/gh/anijain2305/963/orig
2025-12-04T08:57:43.6284338Z  * [new branch]              gh/anijain2305/964/base     -> origin/gh/anijain2305/964/base
2025-12-04T08:57:43.6285416Z  * [new branch]              gh/anijain2305/964/head     -> origin/gh/anijain2305/964/head
2025-12-04T08:57:43.6286482Z  * [new branch]              gh/anijain2305/964/orig     -> origin/gh/anijain2305/964/orig
2025-12-04T08:57:43.6288453Z  * [new branch]              gh/anijain2305/965/base     -> origin/gh/anijain2305/965/base
2025-12-04T08:57:43.6289537Z  * [new branch]              gh/anijain2305/965/head     -> origin/gh/anijain2305/965/head
2025-12-04T08:57:43.6290713Z  * [new branch]              gh/anijain2305/965/orig     -> origin/gh/anijain2305/965/orig
2025-12-04T08:57:43.6292233Z  * [new branch]              gh/anijain2305/966/base     -> origin/gh/anijain2305/966/base
2025-12-04T08:57:43.6293283Z  * [new branch]              gh/anijain2305/966/head     -> origin/gh/anijain2305/966/head
2025-12-04T08:57:43.6294373Z  * [new branch]              gh/anijain2305/966/orig     -> origin/gh/anijain2305/966/orig
2025-12-04T08:57:43.6295954Z  * [new branch]              gh/anijain2305/967/base     -> origin/gh/anijain2305/967/base
2025-12-04T08:57:43.6297331Z  * [new branch]              gh/anijain2305/967/head     -> origin/gh/anijain2305/967/head
2025-12-04T08:57:43.6298605Z  * [new branch]              gh/anijain2305/967/orig     -> origin/gh/anijain2305/967/orig
2025-12-04T08:57:43.6300222Z  * [new branch]              gh/anijain2305/968/base     -> origin/gh/anijain2305/968/base
2025-12-04T08:57:43.6301333Z  * [new branch]              gh/anijain2305/968/head     -> origin/gh/anijain2305/968/head
2025-12-04T08:57:43.6302467Z  * [new branch]              gh/anijain2305/968/orig     -> origin/gh/anijain2305/968/orig
2025-12-04T08:57:43.6304029Z  * [new branch]              gh/anijain2305/969/base     -> origin/gh/anijain2305/969/base
2025-12-04T08:57:43.6305143Z  * [new branch]              gh/anijain2305/969/head     -> origin/gh/anijain2305/969/head
2025-12-04T08:57:43.6306343Z  * [new branch]              gh/anijain2305/969/orig     -> origin/gh/anijain2305/969/orig
2025-12-04T08:57:43.6308198Z  * [new branch]              gh/anijain2305/970/base     -> origin/gh/anijain2305/970/base
2025-12-04T08:57:43.6309357Z  * [new branch]              gh/anijain2305/970/head     -> origin/gh/anijain2305/970/head
2025-12-04T08:57:43.6310521Z  * [new branch]              gh/anijain2305/970/orig     -> origin/gh/anijain2305/970/orig
2025-12-04T08:57:43.6312355Z  * [new branch]              gh/anjali411/216/base       -> origin/gh/anjali411/216/base
2025-12-04T08:57:43.6313419Z  * [new branch]              gh/anjali411/216/head       -> origin/gh/anjali411/216/head
2025-12-04T08:57:43.6314525Z  * [new branch]              gh/anjali411/216/orig       -> origin/gh/anjali411/216/orig
2025-12-04T08:57:43.6316487Z  * [new branch]              gh/anshul-si/1/base         -> origin/gh/anshul-si/1/base
2025-12-04T08:57:43.6317573Z  * [new branch]              gh/anshul-si/1/head         -> origin/gh/anshul-si/1/head
2025-12-04T08:57:43.6318953Z  * [new branch]              gh/anshul-si/2/base         -> origin/gh/anshul-si/2/base
2025-12-04T08:57:43.6320002Z  * [new branch]              gh/anshul-si/2/head         -> origin/gh/anshul-si/2/head
2025-12-04T08:57:43.6321817Z  * [new branch]              gh/anshul-si/3/base         -> origin/gh/anshul-si/3/base
2025-12-04T08:57:43.6322862Z  * [new branch]              gh/anshul-si/3/head         -> origin/gh/anshul-si/3/head
2025-12-04T08:57:43.6324256Z  * [new branch]              gh/anshul-si/4/base         -> origin/gh/anshul-si/4/base
2025-12-04T08:57:43.6325650Z  * [new branch]              gh/anshul-si/4/head         -> origin/gh/anshul-si/4/head
2025-12-04T08:57:43.6327116Z  * [new branch]              gh/anshul-si/5/base         -> origin/gh/anshul-si/5/base
2025-12-04T08:57:43.6328178Z  * [new branch]              gh/anshul-si/5/head         -> origin/gh/anshul-si/5/head
2025-12-04T08:57:43.6329901Z  * [new branch]              gh/anshul-si/53/base        -> origin/gh/anshul-si/53/base
2025-12-04T08:57:43.6330996Z  * [new branch]              gh/anshul-si/53/head        -> origin/gh/anshul-si/53/head
2025-12-04T08:57:43.6332595Z  * [new branch]              gh/anshul-si/58/base        -> origin/gh/anshul-si/58/base
2025-12-04T08:57:43.6333822Z  * [new branch]              gh/anshul-si/58/head        -> origin/gh/anshul-si/58/head
2025-12-04T08:57:43.6335284Z  * [new branch]              gh/anshul-si/66/base        -> origin/gh/anshul-si/66/base
2025-12-04T08:57:43.6336393Z  * [new branch]              gh/anshul-si/66/head        -> origin/gh/anshul-si/66/head
2025-12-04T08:57:43.6337802Z  * [new branch]              gh/anshul-si/66/orig        -> origin/gh/anshul-si/66/orig
2025-12-04T08:57:43.6339238Z  * [new branch]              gh/anshul-si/67/base        -> origin/gh/anshul-si/67/base
2025-12-04T08:57:43.6340364Z  * [new branch]              gh/anshul-si/67/head        -> origin/gh/anshul-si/67/head
2025-12-04T08:57:43.6341471Z  * [new branch]              gh/anshul-si/67/orig        -> origin/gh/anshul-si/67/orig
2025-12-04T08:57:43.6343232Z  * [new branch]              gh/anshul-si/68/base        -> origin/gh/anshul-si/68/base
2025-12-04T08:57:43.6344252Z  * [new branch]              gh/anshul-si/68/head        -> origin/gh/anshul-si/68/head
2025-12-04T08:57:43.6345324Z  * [new branch]              gh/anshul-si/68/orig        -> origin/gh/anshul-si/68/orig
2025-12-04T08:57:43.6347164Z  * [new branch]              gh/anshul-si/69/base        -> origin/gh/anshul-si/69/base
2025-12-04T08:57:43.6348203Z  * [new branch]              gh/anshul-si/69/head        -> origin/gh/anshul-si/69/head
2025-12-04T08:57:43.6349424Z  * [new branch]              gh/anshul-si/69/orig        -> origin/gh/anshul-si/69/orig
2025-12-04T08:57:43.6351260Z  * [new branch]              gh/anshul-si/70/base        -> origin/gh/anshul-si/70/base
2025-12-04T08:57:43.6352360Z  * [new branch]              gh/anshul-si/70/head        -> origin/gh/anshul-si/70/head
2025-12-04T08:57:43.6353480Z  * [new branch]              gh/anshul-si/70/orig        -> origin/gh/anshul-si/70/orig
2025-12-04T08:57:43.6355231Z  * [new branch]              gh/anshul-si/71/base        -> origin/gh/anshul-si/71/base
2025-12-04T08:57:43.6356178Z  * [new branch]              gh/anshul-si/71/head        -> origin/gh/anshul-si/71/head
2025-12-04T08:57:43.6357256Z  * [new branch]              gh/anshul-si/71/orig        -> origin/gh/anshul-si/71/orig
2025-12-04T08:57:43.6358856Z  * [new branch]              gh/anshul-si/72/base        -> origin/gh/anshul-si/72/base
2025-12-04T08:57:43.6359951Z  * [new branch]              gh/anshul-si/72/head        -> origin/gh/anshul-si/72/head
2025-12-04T08:57:43.6361076Z  * [new branch]              gh/anshul-si/72/orig        -> origin/gh/anshul-si/72/orig
2025-12-04T08:57:43.6362587Z  * [new branch]              gh/anshul-si/73/base        -> origin/gh/anshul-si/73/base
2025-12-04T08:57:43.6363696Z  * [new branch]              gh/anshul-si/73/head        -> origin/gh/anshul-si/73/head
2025-12-04T08:57:43.6364826Z  * [new branch]              gh/anshul-si/73/orig        -> origin/gh/anshul-si/73/orig
2025-12-04T08:57:43.6366813Z  * [new branch]              gh/aorenste/132/base        -> origin/gh/aorenste/132/base
2025-12-04T08:57:43.6367854Z  * [new branch]              gh/aorenste/132/head        -> origin/gh/aorenste/132/head
2025-12-04T08:57:43.6369584Z  * [new branch]              gh/aorenste/134/base        -> origin/gh/aorenste/134/base
2025-12-04T08:57:43.6370766Z  * [new branch]              gh/aorenste/134/head        -> origin/gh/aorenste/134/head
2025-12-04T08:57:43.6371880Z  * [new branch]              gh/aorenste/134/orig        -> origin/gh/aorenste/134/orig
2025-12-04T08:57:43.6373495Z  * [new branch]              gh/aorenste/139/base        -> origin/gh/aorenste/139/base
2025-12-04T08:57:43.6374522Z  * [new branch]              gh/aorenste/139/head        -> origin/gh/aorenste/139/head
2025-12-04T08:57:43.6375664Z  * [new branch]              gh/aorenste/139/orig        -> origin/gh/aorenste/139/orig
2025-12-04T08:57:43.6377637Z  * [new branch]              gh/aorenste/141/base        -> origin/gh/aorenste/141/base
2025-12-04T08:57:43.6378590Z  * [new branch]              gh/aorenste/141/head        -> origin/gh/aorenste/141/head
2025-12-04T08:57:43.6380542Z  * [new branch]              gh/aorenste/145/base        -> origin/gh/aorenste/145/base
2025-12-04T08:57:43.6381695Z  * [new branch]              gh/aorenste/145/head        -> origin/gh/aorenste/145/head
2025-12-04T08:57:43.6383106Z  * [new branch]              gh/aorenste/145/orig        -> origin/gh/aorenste/145/orig
2025-12-04T08:57:43.6384701Z  * [new branch]              gh/aorenste/146/base        -> origin/gh/aorenste/146/base
2025-12-04T08:57:43.6385925Z  * [new branch]              gh/aorenste/146/head        -> origin/gh/aorenste/146/head
2025-12-04T08:57:43.6387081Z  * [new branch]              gh/aorenste/146/orig        -> origin/gh/aorenste/146/orig
2025-12-04T08:57:43.6388882Z  * [new branch]              gh/aorenste/147/base        -> origin/gh/aorenste/147/base
2025-12-04T08:57:43.6390083Z  * [new branch]              gh/aorenste/147/head        -> origin/gh/aorenste/147/head
2025-12-04T08:57:43.6391187Z  * [new branch]              gh/aorenste/147/orig        -> origin/gh/aorenste/147/orig
2025-12-04T08:57:43.6392726Z  * [new branch]              gh/aorenste/148/base        -> origin/gh/aorenste/148/base
2025-12-04T08:57:43.6393836Z  * [new branch]              gh/aorenste/148/head        -> origin/gh/aorenste/148/head
2025-12-04T08:57:43.6395004Z  * [new branch]              gh/aorenste/148/orig        -> origin/gh/aorenste/148/orig
2025-12-04T08:57:43.6396545Z  * [new branch]              gh/aorenste/149/base        -> origin/gh/aorenste/149/base
2025-12-04T08:57:43.6397681Z  * [new branch]              gh/aorenste/149/head        -> origin/gh/aorenste/149/head
2025-12-04T08:57:43.6398793Z  * [new branch]              gh/aorenste/149/orig        -> origin/gh/aorenste/149/orig
2025-12-04T08:57:43.6400321Z  * [new branch]              gh/aorenste/150/base        -> origin/gh/aorenste/150/base
2025-12-04T08:57:43.6401428Z  * [new branch]              gh/aorenste/150/head        -> origin/gh/aorenste/150/head
2025-12-04T08:57:43.6402493Z  * [new branch]              gh/aorenste/150/orig        -> origin/gh/aorenste/150/orig
2025-12-04T08:57:43.6403910Z  * [new branch]              gh/aorenste/151/base        -> origin/gh/aorenste/151/base
2025-12-04T08:57:43.6405016Z  * [new branch]              gh/aorenste/151/head        -> origin/gh/aorenste/151/head
2025-12-04T08:57:43.6406206Z  * [new branch]              gh/aorenste/151/orig        -> origin/gh/aorenste/151/orig
2025-12-04T08:57:43.6407766Z  * [new branch]              gh/aorenste/152/base        -> origin/gh/aorenste/152/base
2025-12-04T08:57:43.6408801Z  * [new branch]              gh/aorenste/152/head        -> origin/gh/aorenste/152/head
2025-12-04T08:57:43.6409881Z  * [new branch]              gh/aorenste/152/orig        -> origin/gh/aorenste/152/orig
2025-12-04T08:57:43.6411250Z  * [new branch]              gh/aorenste/153/base        -> origin/gh/aorenste/153/base
2025-12-04T08:57:43.6412372Z  * [new branch]              gh/aorenste/153/head        -> origin/gh/aorenste/153/head
2025-12-04T08:57:43.6413441Z  * [new branch]              gh/aorenste/153/orig        -> origin/gh/aorenste/153/orig
2025-12-04T08:57:43.6414836Z  * [new branch]              gh/aorenste/154/base        -> origin/gh/aorenste/154/base
2025-12-04T08:57:43.6415900Z  * [new branch]              gh/aorenste/154/head        -> origin/gh/aorenste/154/head
2025-12-04T08:57:43.6417718Z  * [new branch]              gh/aorenste/154/orig        -> origin/gh/aorenste/154/orig
2025-12-04T08:57:43.6418770Z  * [new branch]              gh/aorenste/155/base        -> origin/gh/aorenste/155/base
2025-12-04T08:57:43.6419880Z  * [new branch]              gh/aorenste/155/head        -> origin/gh/aorenste/155/head
2025-12-04T08:57:43.6421099Z  * [new branch]              gh/aorenste/155/orig        -> origin/gh/aorenste/155/orig
2025-12-04T08:57:43.6422608Z  * [new branch]              gh/aorenste/156/base        -> origin/gh/aorenste/156/base
2025-12-04T08:57:43.6423691Z  * [new branch]              gh/aorenste/156/head        -> origin/gh/aorenste/156/head
2025-12-04T08:57:43.6424724Z  * [new branch]              gh/aorenste/156/orig        -> origin/gh/aorenste/156/orig
2025-12-04T08:57:43.6426581Z  * [new branch]              gh/aorenste/157/base        -> origin/gh/aorenste/157/base
2025-12-04T08:57:43.6427844Z  * [new branch]              gh/aorenste/157/head        -> origin/gh/aorenste/157/head
2025-12-04T08:57:43.6428906Z  * [new branch]              gh/aorenste/157/orig        -> origin/gh/aorenste/157/orig
2025-12-04T08:57:43.6430402Z  * [new branch]              gh/aorenste/158/base        -> origin/gh/aorenste/158/base
2025-12-04T08:57:43.6431556Z  * [new branch]              gh/aorenste/158/head        -> origin/gh/aorenste/158/head
2025-12-04T08:57:43.6432629Z  * [new branch]              gh/aorenste/158/orig        -> origin/gh/aorenste/158/orig
2025-12-04T08:57:43.6434143Z  * [new branch]              gh/aorenste/159/base        -> origin/gh/aorenste/159/base
2025-12-04T08:57:43.6435219Z  * [new branch]              gh/aorenste/159/head        -> origin/gh/aorenste/159/head
2025-12-04T08:57:43.6436223Z  * [new branch]              gh/aorenste/159/orig        -> origin/gh/aorenste/159/orig
2025-12-04T08:57:43.6438066Z  * [new branch]              gh/avikchaudhuri/1/base     -> origin/gh/avikchaudhuri/1/base
2025-12-04T08:57:43.6439151Z  * [new branch]              gh/avikchaudhuri/1/head     -> origin/gh/avikchaudhuri/1/head
2025-12-04T08:57:43.6440526Z  * [new branch]              gh/avikchaudhuri/2/base     -> origin/gh/avikchaudhuri/2/base
2025-12-04T08:57:43.6441592Z  * [new branch]              gh/avikchaudhuri/2/head     -> origin/gh/avikchaudhuri/2/head
2025-12-04T08:57:43.6442799Z  * [new branch]              gh/avikchaudhuri/2/orig     -> origin/gh/avikchaudhuri/2/orig
2025-12-04T08:57:43.6444855Z  * [new branch]              gh/bdhirsh/666/base         -> origin/gh/bdhirsh/666/base
2025-12-04T08:57:43.6446052Z  * [new branch]              gh/bdhirsh/666/head         -> origin/gh/bdhirsh/666/head
2025-12-04T08:57:43.6447072Z  * [new branch]              gh/bdhirsh/666/orig         -> origin/gh/bdhirsh/666/orig
2025-12-04T08:57:43.6468447Z  * [new branch]              gh/bdhirsh/668/base         -> origin/gh/bdhirsh/668/base
2025-12-04T08:57:43.6469360Z  * [new branch]              gh/bdhirsh/668/head         -> origin/gh/bdhirsh/668/head
2025-12-04T08:57:43.6469962Z  * [new branch]              gh/bdhirsh/668/orig         -> origin/gh/bdhirsh/668/orig
2025-12-04T08:57:43.6470576Z  * [new branch]              gh/bdhirsh/669/base         -> origin/gh/bdhirsh/669/base
2025-12-04T08:57:43.6471190Z  * [new branch]              gh/bdhirsh/669/head         -> origin/gh/bdhirsh/669/head
2025-12-04T08:57:43.6471797Z  * [new branch]              gh/bdhirsh/669/orig         -> origin/gh/bdhirsh/669/orig
2025-12-04T08:57:43.6472394Z  * [new branch]              gh/bdhirsh/670/base         -> origin/gh/bdhirsh/670/base
2025-12-04T08:57:43.6473017Z  * [new branch]              gh/bdhirsh/670/head         -> origin/gh/bdhirsh/670/head
2025-12-04T08:57:43.6473633Z  * [new branch]              gh/bdhirsh/670/orig         -> origin/gh/bdhirsh/670/orig
2025-12-04T08:57:43.6474223Z  * [new branch]              gh/bdhirsh/672/base         -> origin/gh/bdhirsh/672/base
2025-12-04T08:57:43.6474825Z  * [new branch]              gh/bdhirsh/672/head         -> origin/gh/bdhirsh/672/head
2025-12-04T08:57:43.6475430Z  * [new branch]              gh/bdhirsh/672/orig         -> origin/gh/bdhirsh/672/orig
2025-12-04T08:57:43.6476040Z  * [new branch]              gh/bdhirsh/675/base         -> origin/gh/bdhirsh/675/base
2025-12-04T08:57:43.6476635Z  * [new branch]              gh/bdhirsh/675/head         -> origin/gh/bdhirsh/675/head
2025-12-04T08:57:43.6477240Z  * [new branch]              gh/bdhirsh/675/orig         -> origin/gh/bdhirsh/675/orig
2025-12-04T08:57:43.6477861Z  * [new branch]              gh/bdhirsh/676/base         -> origin/gh/bdhirsh/676/base
2025-12-04T08:57:43.6478473Z  * [new branch]              gh/bdhirsh/676/head         -> origin/gh/bdhirsh/676/head
2025-12-04T08:57:43.6479077Z  * [new branch]              gh/bdhirsh/676/orig         -> origin/gh/bdhirsh/676/orig
2025-12-04T08:57:43.6479685Z  * [new branch]              gh/bdhirsh/677/base         -> origin/gh/bdhirsh/677/base
2025-12-04T08:57:43.6479915Z  * [new branch]              gh/bdhirsh/677/head         -> origin/gh/bdhirsh/677/head
2025-12-04T08:57:43.6480161Z  * [new branch]              gh/bdhirsh/677/orig         -> origin/gh/bdhirsh/677/orig
2025-12-04T08:57:43.6480392Z  * [new branch]              gh/bdhirsh/678/base         -> origin/gh/bdhirsh/678/base
2025-12-04T08:57:43.6480635Z  * [new branch]              gh/bdhirsh/678/head         -> origin/gh/bdhirsh/678/head
2025-12-04T08:57:43.6480868Z  * [new branch]              gh/bdhirsh/678/orig         -> origin/gh/bdhirsh/678/orig
2025-12-04T08:57:43.6481118Z  * [new branch]              gh/bdhirsh/679/base         -> origin/gh/bdhirsh/679/base
2025-12-04T08:57:43.6482193Z  * [new branch]              gh/bdhirsh/679/head         -> origin/gh/bdhirsh/679/head
2025-12-04T08:57:43.6483540Z  * [new branch]              gh/bdhirsh/679/orig         -> origin/gh/bdhirsh/679/orig
2025-12-04T08:57:43.6485038Z  * [new branch]              gh/bdhirsh/680/base         -> origin/gh/bdhirsh/680/base
2025-12-04T08:57:43.6486236Z  * [new branch]              gh/bdhirsh/680/head         -> origin/gh/bdhirsh/680/head
2025-12-04T08:57:43.6487336Z  * [new branch]              gh/bdhirsh/680/orig         -> origin/gh/bdhirsh/680/orig
2025-12-04T08:57:43.6488792Z  * [new branch]              gh/bdhirsh/681/base         -> origin/gh/bdhirsh/681/base
2025-12-04T08:57:43.6489979Z  * [new branch]              gh/bdhirsh/681/head         -> origin/gh/bdhirsh/681/head
2025-12-04T08:57:43.6491100Z  * [new branch]              gh/bdhirsh/681/orig         -> origin/gh/bdhirsh/681/orig
2025-12-04T08:57:43.6493091Z  * [new branch]              gh/benjaminglass1/101/base  -> origin/gh/benjaminglass1/101/base
2025-12-04T08:57:43.6494110Z  * [new branch]              gh/benjaminglass1/101/head  -> origin/gh/benjaminglass1/101/head
2025-12-04T08:57:43.6495262Z  * [new branch]              gh/benjaminglass1/101/orig  -> origin/gh/benjaminglass1/101/orig
2025-12-04T08:57:43.6496951Z  * [new branch]              gh/benjaminglass1/102/base  -> origin/gh/benjaminglass1/102/base
2025-12-04T08:57:43.6498174Z  * [new branch]              gh/benjaminglass1/102/head  -> origin/gh/benjaminglass1/102/head
2025-12-04T08:57:43.6499277Z  * [new branch]              gh/benjaminglass1/102/orig  -> origin/gh/benjaminglass1/102/orig
2025-12-04T08:57:43.6500765Z  * [new branch]              gh/benjaminglass1/106/base  -> origin/gh/benjaminglass1/106/base
2025-12-04T08:57:43.6501889Z  * [new branch]              gh/benjaminglass1/106/head  -> origin/gh/benjaminglass1/106/head
2025-12-04T08:57:43.6503057Z  * [new branch]              gh/benjaminglass1/106/orig  -> origin/gh/benjaminglass1/106/orig
2025-12-04T08:57:43.6504624Z  * [new branch]              gh/benjaminglass1/107/base  -> origin/gh/benjaminglass1/107/base
2025-12-04T08:57:43.6505769Z  * [new branch]              gh/benjaminglass1/107/head  -> origin/gh/benjaminglass1/107/head
2025-12-04T08:57:43.6506903Z  * [new branch]              gh/benjaminglass1/107/orig  -> origin/gh/benjaminglass1/107/orig
2025-12-04T08:57:43.6508387Z  * [new branch]              gh/benjaminglass1/108/base  -> origin/gh/benjaminglass1/108/base
2025-12-04T08:57:43.6509571Z  * [new branch]              gh/benjaminglass1/108/head  -> origin/gh/benjaminglass1/108/head
2025-12-04T08:57:43.6510667Z  * [new branch]              gh/benjaminglass1/108/orig  -> origin/gh/benjaminglass1/108/orig
2025-12-04T08:57:43.6512124Z  * [new branch]              gh/benjaminglass1/109/base  -> origin/gh/benjaminglass1/109/base
2025-12-04T08:57:43.6513242Z  * [new branch]              gh/benjaminglass1/109/head  -> origin/gh/benjaminglass1/109/head
2025-12-04T08:57:43.6514377Z  * [new branch]              gh/benjaminglass1/109/orig  -> origin/gh/benjaminglass1/109/orig
2025-12-04T08:57:43.6515794Z  * [new branch]              gh/benjaminglass1/97/base   -> origin/gh/benjaminglass1/97/base
2025-12-04T08:57:43.6516877Z  * [new branch]              gh/benjaminglass1/97/head   -> origin/gh/benjaminglass1/97/head
2025-12-04T08:57:43.6518003Z  * [new branch]              gh/benjaminglass1/97/orig   -> origin/gh/benjaminglass1/97/orig
2025-12-04T08:57:43.6519822Z  * [new branch]              gh/bobrenjc93/570/base      -> origin/gh/bobrenjc93/570/base
2025-12-04T08:57:43.6521085Z  * [new branch]              gh/bobrenjc93/570/head      -> origin/gh/bobrenjc93/570/head
2025-12-04T08:57:43.6524444Z  * [new branch]              gh/bobrenjc93/570/orig      -> origin/gh/bobrenjc93/570/orig
2025-12-04T08:57:43.6525842Z  * [new branch]              gh/bobrenjc93/604/base      -> origin/gh/bobrenjc93/604/base
2025-12-04T08:57:43.6526963Z  * [new branch]              gh/bobrenjc93/604/head      -> origin/gh/bobrenjc93/604/head
2025-12-04T08:57:43.6528113Z  * [new branch]              gh/bobrenjc93/604/orig      -> origin/gh/bobrenjc93/604/orig
2025-12-04T08:57:43.6529627Z  * [new branch]              gh/bobrenjc93/638/base      -> origin/gh/bobrenjc93/638/base
2025-12-04T08:57:43.6530844Z  * [new branch]              gh/bobrenjc93/638/head      -> origin/gh/bobrenjc93/638/head
2025-12-04T08:57:43.6531961Z  * [new branch]              gh/bobrenjc93/638/orig      -> origin/gh/bobrenjc93/638/orig
2025-12-04T08:57:43.6533542Z  * [new branch]              gh/bobrenjc93/653/base      -> origin/gh/bobrenjc93/653/base
2025-12-04T08:57:43.6534638Z  * [new branch]              gh/bobrenjc93/653/head      -> origin/gh/bobrenjc93/653/head
2025-12-04T08:57:43.6535815Z  * [new branch]              gh/bobrenjc93/653/orig      -> origin/gh/bobrenjc93/653/orig
2025-12-04T08:57:43.6537684Z  * [new branch]              gh/bobrenjc93/654/base      -> origin/gh/bobrenjc93/654/base
2025-12-04T08:57:43.6538995Z  * [new branch]              gh/bobrenjc93/654/head      -> origin/gh/bobrenjc93/654/head
2025-12-04T08:57:43.6539898Z  * [new branch]              gh/bobrenjc93/654/orig      -> origin/gh/bobrenjc93/654/orig
2025-12-04T08:57:43.6541443Z  * [new branch]              gh/bobrenjc93/657/base      -> origin/gh/bobrenjc93/657/base
2025-12-04T08:57:43.6542496Z  * [new branch]              gh/bobrenjc93/657/head      -> origin/gh/bobrenjc93/657/head
2025-12-04T08:57:43.6543614Z  * [new branch]              gh/bobrenjc93/657/orig      -> origin/gh/bobrenjc93/657/orig
2025-12-04T08:57:43.6545128Z  * [new branch]              gh/bobrenjc93/672/base      -> origin/gh/bobrenjc93/672/base
2025-12-04T08:57:43.6546185Z  * [new branch]              gh/bobrenjc93/672/head      -> origin/gh/bobrenjc93/672/head
2025-12-04T08:57:43.6547322Z  * [new branch]              gh/bobrenjc93/672/orig      -> origin/gh/bobrenjc93/672/orig
2025-12-04T08:57:43.6548981Z  * [new branch]              gh/bobrenjc93/679/base      -> origin/gh/bobrenjc93/679/base
2025-12-04T08:57:43.6550266Z  * [new branch]              gh/bobrenjc93/679/head      -> origin/gh/bobrenjc93/679/head
2025-12-04T08:57:43.6551588Z  * [new branch]              gh/bobrenjc93/679/orig      -> origin/gh/bobrenjc93/679/orig
2025-12-04T08:57:43.6553111Z  * [new branch]              gh/bobrenjc93/680/base      -> origin/gh/bobrenjc93/680/base
2025-12-04T08:57:43.6554174Z  * [new branch]              gh/bobrenjc93/680/head      -> origin/gh/bobrenjc93/680/head
2025-12-04T08:57:43.6555333Z  * [new branch]              gh/bobrenjc93/680/orig      -> origin/gh/bobrenjc93/680/orig
2025-12-04T08:57:43.6556638Z  * [new branch]              gh/bobrenjc93/681/base      -> origin/gh/bobrenjc93/681/base
2025-12-04T08:57:43.6557719Z  * [new branch]              gh/bobrenjc93/681/head      -> origin/gh/bobrenjc93/681/head
2025-12-04T08:57:43.6558850Z  * [new branch]              gh/bobrenjc93/681/orig      -> origin/gh/bobrenjc93/681/orig
2025-12-04T08:57:43.6560172Z  * [new branch]              gh/bobrenjc93/682/base      -> origin/gh/bobrenjc93/682/base
2025-12-04T08:57:43.6561290Z  * [new branch]              gh/bobrenjc93/682/head      -> origin/gh/bobrenjc93/682/head
2025-12-04T08:57:43.6562409Z  * [new branch]              gh/bobrenjc93/682/orig      -> origin/gh/bobrenjc93/682/orig
2025-12-04T08:57:43.6563860Z  * [new branch]              gh/bobrenjc93/683/base      -> origin/gh/bobrenjc93/683/base
2025-12-04T08:57:43.6564947Z  * [new branch]              gh/bobrenjc93/683/head      -> origin/gh/bobrenjc93/683/head
2025-12-04T08:57:43.6566158Z  * [new branch]              gh/bobrenjc93/683/orig      -> origin/gh/bobrenjc93/683/orig
2025-12-04T08:57:43.6567628Z  * [new branch]              gh/bobrenjc93/684/base      -> origin/gh/bobrenjc93/684/base
2025-12-04T08:57:43.6568890Z  * [new branch]              gh/bobrenjc93/684/head      -> origin/gh/bobrenjc93/684/head
2025-12-04T08:57:43.6570185Z  * [new branch]              gh/bobrenjc93/684/orig      -> origin/gh/bobrenjc93/684/orig
2025-12-04T08:57:43.6571513Z  * [new branch]              gh/bobrenjc93/685/base      -> origin/gh/bobrenjc93/685/base
2025-12-04T08:57:43.6572895Z  * [new branch]              gh/bobrenjc93/685/head      -> origin/gh/bobrenjc93/685/head
2025-12-04T08:57:43.6574266Z  * [new branch]              gh/bobrenjc93/685/orig      -> origin/gh/bobrenjc93/685/orig
2025-12-04T08:57:43.6575921Z  * [new branch]              gh/bobrenjc93/686/base      -> origin/gh/bobrenjc93/686/base
2025-12-04T08:57:43.6580450Z  * [new branch]              gh/bobrenjc93/686/head      -> origin/gh/bobrenjc93/686/head
2025-12-04T08:57:43.6580714Z  * [new branch]              gh/bobrenjc93/686/orig      -> origin/gh/bobrenjc93/686/orig
2025-12-04T08:57:43.6580982Z  * [new branch]              gh/bobrenjc93/687/base      -> origin/gh/bobrenjc93/687/base
2025-12-04T08:57:43.6581574Z  * [new branch]              gh/bobrenjc93/687/head      -> origin/gh/bobrenjc93/687/head
2025-12-04T08:57:43.6582701Z  * [new branch]              gh/bobrenjc93/687/orig      -> origin/gh/bobrenjc93/687/orig
2025-12-04T08:57:43.6584593Z  * [new branch]              gh/bobrenjc93/688/base      -> origin/gh/bobrenjc93/688/base
2025-12-04T08:57:43.6585730Z  * [new branch]              gh/bobrenjc93/688/head      -> origin/gh/bobrenjc93/688/head
2025-12-04T08:57:43.6586840Z  * [new branch]              gh/bobrenjc93/688/orig      -> origin/gh/bobrenjc93/688/orig
2025-12-04T08:57:43.6588246Z  * [new branch]              gh/bobrenjc93/689/base      -> origin/gh/bobrenjc93/689/base
2025-12-04T08:57:43.6589612Z  * [new branch]              gh/bobrenjc93/689/head      -> origin/gh/bobrenjc93/689/head
2025-12-04T08:57:43.6590740Z  * [new branch]              gh/bobrenjc93/689/orig      -> origin/gh/bobrenjc93/689/orig
2025-12-04T08:57:43.6592092Z  * [new branch]              gh/bobrenjc93/690/base      -> origin/gh/bobrenjc93/690/base
2025-12-04T08:57:43.6593188Z  * [new branch]              gh/bobrenjc93/690/head      -> origin/gh/bobrenjc93/690/head
2025-12-04T08:57:43.6594272Z  * [new branch]              gh/bobrenjc93/690/orig      -> origin/gh/bobrenjc93/690/orig
2025-12-04T08:57:43.6596571Z  * [new branch]              gh/bobrenjc93/691/base      -> origin/gh/bobrenjc93/691/base
2025-12-04T08:57:43.6597989Z  * [new branch]              gh/bobrenjc93/691/head      -> origin/gh/bobrenjc93/691/head
2025-12-04T08:57:43.6599982Z  * [new branch]              gh/bobrenjc93/691/orig      -> origin/gh/bobrenjc93/691/orig
2025-12-04T08:57:43.6601868Z  * [new branch]              gh/bobrenjc93/692/base      -> origin/gh/bobrenjc93/692/base
2025-12-04T08:57:43.6602945Z  * [new branch]              gh/bobrenjc93/692/head      -> origin/gh/bobrenjc93/692/head
2025-12-04T08:57:43.6604053Z  * [new branch]              gh/bobrenjc93/692/orig      -> origin/gh/bobrenjc93/692/orig
2025-12-04T08:57:43.6605381Z  * [new branch]              gh/bobrenjc93/693/base      -> origin/gh/bobrenjc93/693/base
2025-12-04T08:57:43.6606454Z  * [new branch]              gh/bobrenjc93/693/head      -> origin/gh/bobrenjc93/693/head
2025-12-04T08:57:43.6607621Z  * [new branch]              gh/bobrenjc93/693/orig      -> origin/gh/bobrenjc93/693/orig
2025-12-04T08:57:43.6609590Z  * [new branch]              gh/bobrenjc93/694/base      -> origin/gh/bobrenjc93/694/base
2025-12-04T08:57:43.6610729Z  * [new branch]              gh/bobrenjc93/694/head      -> origin/gh/bobrenjc93/694/head
2025-12-04T08:57:43.6611855Z  * [new branch]              gh/bobrenjc93/694/orig      -> origin/gh/bobrenjc93/694/orig
2025-12-04T08:57:43.6613217Z  * [new branch]              gh/bobrenjc93/695/base      -> origin/gh/bobrenjc93/695/base
2025-12-04T08:57:43.6614289Z  * [new branch]              gh/bobrenjc93/695/head      -> origin/gh/bobrenjc93/695/head
2025-12-04T08:57:43.6615518Z  * [new branch]              gh/bobrenjc93/695/orig      -> origin/gh/bobrenjc93/695/orig
2025-12-04T08:57:43.6617675Z  * [new branch]              gh/c00w/23/base             -> origin/gh/c00w/23/base
2025-12-04T08:57:43.6618827Z  * [new branch]              gh/c00w/23/head             -> origin/gh/c00w/23/head
2025-12-04T08:57:43.6620648Z  * [new branch]              gh/c00w/53/base             -> origin/gh/c00w/53/base
2025-12-04T08:57:43.6622007Z  * [new branch]              gh/c00w/53/head             -> origin/gh/c00w/53/head
2025-12-04T08:57:43.6623101Z  * [new branch]              gh/c00w/53/orig             -> origin/gh/c00w/53/orig
2025-12-04T08:57:43.6624449Z  * [new branch]              gh/c00w/54/base             -> origin/gh/c00w/54/base
2025-12-04T08:57:43.6625602Z  * [new branch]              gh/c00w/54/head             -> origin/gh/c00w/54/head
2025-12-04T08:57:43.6626777Z  * [new branch]              gh/c00w/54/orig             -> origin/gh/c00w/54/orig
2025-12-04T08:57:43.6628246Z  * [new branch]              gh/c00w/56/base             -> origin/gh/c00w/56/base
2025-12-04T08:57:43.6629360Z  * [new branch]              gh/c00w/56/head             -> origin/gh/c00w/56/head
2025-12-04T08:57:43.6630584Z  * [new branch]              gh/c00w/56/orig             -> origin/gh/c00w/56/orig
2025-12-04T08:57:43.6632234Z  * [new branch]              gh/c00w/57/base             -> origin/gh/c00w/57/base
2025-12-04T08:57:43.6633366Z  * [new branch]              gh/c00w/57/head             -> origin/gh/c00w/57/head
2025-12-04T08:57:43.6634530Z  * [new branch]              gh/c00w/57/orig             -> origin/gh/c00w/57/orig
2025-12-04T08:57:43.6635894Z  * [new branch]              gh/c00w/58/base             -> origin/gh/c00w/58/base
2025-12-04T08:57:43.6636959Z  * [new branch]              gh/c00w/58/head             -> origin/gh/c00w/58/head
2025-12-04T08:57:43.6638149Z  * [new branch]              gh/c00w/58/orig             -> origin/gh/c00w/58/orig
2025-12-04T08:57:43.6639876Z  * [new branch]              gh/clee2000/1/base          -> origin/gh/clee2000/1/base
2025-12-04T08:57:43.6641056Z  * [new branch]              gh/clee2000/1/head          -> origin/gh/clee2000/1/head
2025-12-04T08:57:43.6642128Z  * [new branch]              gh/clee2000/1/orig          -> origin/gh/clee2000/1/orig
2025-12-04T08:57:43.6644101Z  * [new branch]              gh/coconutruben/1/base      -> origin/gh/coconutruben/1/base
2025-12-04T08:57:43.6645297Z  * [new branch]              gh/coconutruben/1/head      -> origin/gh/coconutruben/1/head
2025-12-04T08:57:43.6647065Z  * [new branch]              gh/coconutruben/55/base     -> origin/gh/coconutruben/55/base
2025-12-04T08:57:43.6648168Z  * [new branch]              gh/coconutruben/55/head     -> origin/gh/coconutruben/55/head
2025-12-04T08:57:43.6649296Z  * [new branch]              gh/coconutruben/55/orig     -> origin/gh/coconutruben/55/orig
2025-12-04T08:57:43.6650919Z  * [new branch]              gh/coconutruben/57/base     -> origin/gh/coconutruben/57/base
2025-12-04T08:57:43.6652177Z  * [new branch]              gh/coconutruben/57/head     -> origin/gh/coconutruben/57/head
2025-12-04T08:57:43.6653366Z  * [new branch]              gh/coconutruben/57/orig     -> origin/gh/coconutruben/57/orig
2025-12-04T08:57:43.6654843Z  * [new branch]              gh/coconutruben/70/base     -> origin/gh/coconutruben/70/base
2025-12-04T08:57:43.6655965Z  * [new branch]              gh/coconutruben/70/head     -> origin/gh/coconutruben/70/head
2025-12-04T08:57:43.6657596Z  * [new branch]              gh/coconutruben/70/orig     -> origin/gh/coconutruben/70/orig
2025-12-04T08:57:43.6658860Z  * [new branch]              gh/coconutruben/71/base     -> origin/gh/coconutruben/71/base
2025-12-04T08:57:43.6660164Z  * [new branch]              gh/coconutruben/71/head     -> origin/gh/coconutruben/71/head
2025-12-04T08:57:43.6661300Z  * [new branch]              gh/coconutruben/71/orig     -> origin/gh/coconutruben/71/orig
2025-12-04T08:57:43.6662656Z  * [new branch]              gh/coconutruben/72/base     -> origin/gh/coconutruben/72/base
2025-12-04T08:57:43.6663798Z  * [new branch]              gh/coconutruben/72/head     -> origin/gh/coconutruben/72/head
2025-12-04T08:57:43.6664950Z  * [new branch]              gh/coconutruben/72/orig     -> origin/gh/coconutruben/72/orig
2025-12-04T08:57:43.6666321Z  * [new branch]              gh/coconutruben/73/base     -> origin/gh/coconutruben/73/base
2025-12-04T08:57:43.6667651Z  * [new branch]              gh/coconutruben/73/head     -> origin/gh/coconutruben/73/head
2025-12-04T08:57:43.6668671Z  * [new branch]              gh/coconutruben/73/orig     -> origin/gh/coconutruben/73/orig
2025-12-04T08:57:43.6670369Z  * [new branch]              gh/coconutruben/74/base     -> origin/gh/coconutruben/74/base
2025-12-04T08:57:43.6671584Z  * [new branch]              gh/coconutruben/74/head     -> origin/gh/coconutruben/74/head
2025-12-04T08:57:43.6672679Z  * [new branch]              gh/coconutruben/74/orig     -> origin/gh/coconutruben/74/orig
2025-12-04T08:57:43.6674221Z  * [new branch]              gh/coconutruben/79/base     -> origin/gh/coconutruben/79/base
2025-12-04T08:57:43.6675622Z  * [new branch]              gh/coconutruben/79/head     -> origin/gh/coconutruben/79/head
2025-12-04T08:57:43.6676826Z  * [new branch]              gh/coconutruben/79/orig     -> origin/gh/coconutruben/79/orig
2025-12-04T08:57:43.6678221Z  * [new branch]              gh/coconutruben/80/base     -> origin/gh/coconutruben/80/base
2025-12-04T08:57:43.6679440Z  * [new branch]              gh/coconutruben/80/head     -> origin/gh/coconutruben/80/head
2025-12-04T08:57:43.6680574Z  * [new branch]              gh/coconutruben/80/orig     -> origin/gh/coconutruben/80/orig
2025-12-04T08:57:43.6682207Z  * [new branch]              gh/coconutruben/82/base     -> origin/gh/coconutruben/82/base
2025-12-04T08:57:43.6683251Z  * [new branch]              gh/coconutruben/82/head     -> origin/gh/coconutruben/82/head
2025-12-04T08:57:43.6684323Z  * [new branch]              gh/coconutruben/82/orig     -> origin/gh/coconutruben/82/orig
2025-12-04T08:57:43.6685957Z  * [new branch]              gh/coconutruben/83/base     -> origin/gh/coconutruben/83/base
2025-12-04T08:57:43.6687148Z  * [new branch]              gh/coconutruben/83/head     -> origin/gh/coconutruben/83/head
2025-12-04T08:57:43.6688211Z  * [new branch]              gh/coconutruben/83/orig     -> origin/gh/coconutruben/83/orig
2025-12-04T08:57:43.6690117Z  * [new branch]              gh/coconutruben/84/base     -> origin/gh/coconutruben/84/base
2025-12-04T08:57:43.6691116Z  * [new branch]              gh/coconutruben/84/head     -> origin/gh/coconutruben/84/head
2025-12-04T08:57:43.6692252Z  * [new branch]              gh/coconutruben/84/orig     -> origin/gh/coconutruben/84/orig
2025-12-04T08:57:43.6693717Z  * [new branch]              gh/coconutruben/85/base     -> origin/gh/coconutruben/85/base
2025-12-04T08:57:43.6694847Z  * [new branch]              gh/coconutruben/85/head     -> origin/gh/coconutruben/85/head
2025-12-04T08:57:43.6696014Z  * [new branch]              gh/coconutruben/85/orig     -> origin/gh/coconutruben/85/orig
2025-12-04T08:57:43.6697892Z  * [new branch]              gh/coconutruben/86/base     -> origin/gh/coconutruben/86/base
2025-12-04T08:57:43.6699063Z  * [new branch]              gh/coconutruben/86/head     -> origin/gh/coconutruben/86/head
2025-12-04T08:57:43.6700214Z  * [new branch]              gh/coconutruben/86/orig     -> origin/gh/coconutruben/86/orig
2025-12-04T08:57:43.6702054Z  * [new branch]              gh/colinchan15/1/base       -> origin/gh/colinchan15/1/base
2025-12-04T08:57:43.6703221Z  * [new branch]              gh/colinchan15/1/head       -> origin/gh/colinchan15/1/head
2025-12-04T08:57:43.6704589Z  * [new branch]              gh/colinchan15/2/base       -> origin/gh/colinchan15/2/base
2025-12-04T08:57:43.6705811Z  * [new branch]              gh/colinchan15/2/head       -> origin/gh/colinchan15/2/head
2025-12-04T08:57:43.6707157Z  * [new branch]              gh/colinchan15/3/base       -> origin/gh/colinchan15/3/base
2025-12-04T08:57:43.6708198Z  * [new branch]              gh/colinchan15/3/head       -> origin/gh/colinchan15/3/head
2025-12-04T08:57:43.6709606Z  * [new branch]              gh/colinchan15/6/base       -> origin/gh/colinchan15/6/base
2025-12-04T08:57:43.6710679Z  * [new branch]              gh/colinchan15/6/head       -> origin/gh/colinchan15/6/head
2025-12-04T08:57:43.6712415Z  * [new branch]              gh/d4l3k/1/base             -> origin/gh/d4l3k/1/base
2025-12-04T08:57:43.6713516Z  * [new branch]              gh/d4l3k/1/head             -> origin/gh/d4l3k/1/head
2025-12-04T08:57:43.6714972Z  * [new branch]              gh/d4l3k/2/base             -> origin/gh/d4l3k/2/base
2025-12-04T08:57:43.6716060Z  * [new branch]              gh/d4l3k/2/head             -> origin/gh/d4l3k/2/head
2025-12-04T08:57:43.6717146Z  * [new branch]              gh/d4l3k/2/orig             -> origin/gh/d4l3k/2/orig
2025-12-04T08:57:43.6718697Z  * [new branch]              gh/d4l3k/3/base             -> origin/gh/d4l3k/3/base
2025-12-04T08:57:43.6719758Z  * [new branch]              gh/d4l3k/3/head             -> origin/gh/d4l3k/3/head
2025-12-04T08:57:43.6721038Z  * [new branch]              gh/d4l3k/3/orig             -> origin/gh/d4l3k/3/orig
2025-12-04T08:57:43.6723056Z  * [new branch]              gh/d4l3k/4/base             -> origin/gh/d4l3k/4/base
2025-12-04T08:57:43.6724093Z  * [new branch]              gh/d4l3k/4/head             -> origin/gh/d4l3k/4/head
2025-12-04T08:57:43.6725239Z  * [new branch]              gh/d4l3k/4/orig             -> origin/gh/d4l3k/4/orig
2025-12-04T08:57:43.6726669Z  * [new branch]              gh/d4l3k/5/base             -> origin/gh/d4l3k/5/base
2025-12-04T08:57:43.6727841Z  * [new branch]              gh/d4l3k/5/orig             -> origin/gh/d4l3k/5/orig
2025-12-04T08:57:43.6729684Z  * [new branch]              gh/davidberard98/392/base   -> origin/gh/davidberard98/392/base
2025-12-04T08:57:43.6730820Z  * [new branch]              gh/davidberard98/392/head   -> origin/gh/davidberard98/392/head
2025-12-04T08:57:43.6731950Z  * [new branch]              gh/davidberard98/392/orig   -> origin/gh/davidberard98/392/orig
2025-12-04T08:57:43.6733766Z  * [new branch]              gh/davidberard98/399/base   -> origin/gh/davidberard98/399/base
2025-12-04T08:57:43.6734910Z  * [new branch]              gh/davidberard98/399/head   -> origin/gh/davidberard98/399/head
2025-12-04T08:57:43.6736041Z  * [new branch]              gh/davidberard98/399/orig   -> origin/gh/davidberard98/399/orig
2025-12-04T08:57:43.6738114Z  * [new branch]              gh/desertfire/605/base      -> origin/gh/desertfire/605/base
2025-12-04T08:57:43.6739220Z  * [new branch]              gh/desertfire/605/head      -> origin/gh/desertfire/605/head
2025-12-04T08:57:43.6740423Z  * [new branch]              gh/desertfire/605/orig      -> origin/gh/desertfire/605/orig
2025-12-04T08:57:43.6741938Z  * [new branch]              gh/desertfire/606/base      -> origin/gh/desertfire/606/base
2025-12-04T08:57:43.6743019Z  * [new branch]              gh/desertfire/606/head      -> origin/gh/desertfire/606/head
2025-12-04T08:57:43.6744284Z  * [new branch]              gh/desertfire/606/orig      -> origin/gh/desertfire/606/orig
2025-12-04T08:57:43.6745765Z  * [new branch]              gh/desertfire/607/base      -> origin/gh/desertfire/607/base
2025-12-04T08:57:43.6746870Z  * [new branch]              gh/desertfire/607/head      -> origin/gh/desertfire/607/head
2025-12-04T08:57:43.6748043Z  * [new branch]              gh/desertfire/607/orig      -> origin/gh/desertfire/607/orig
2025-12-04T08:57:43.6749751Z  * [new branch]              gh/desertfire/608/base      -> origin/gh/desertfire/608/base
2025-12-04T08:57:43.6750815Z  * [new branch]              gh/desertfire/608/head      -> origin/gh/desertfire/608/head
2025-12-04T08:57:43.6751948Z  * [new branch]              gh/desertfire/608/orig      -> origin/gh/desertfire/608/orig
2025-12-04T08:57:43.6753388Z  * [new branch]              gh/desertfire/609/base      -> origin/gh/desertfire/609/base
2025-12-04T08:57:43.6754617Z  * [new branch]              gh/desertfire/609/head      -> origin/gh/desertfire/609/head
2025-12-04T08:57:43.6755730Z  * [new branch]              gh/desertfire/609/orig      -> origin/gh/desertfire/609/orig
2025-12-04T08:57:43.6757408Z  * [new branch]              gh/desertfire/610/base      -> origin/gh/desertfire/610/base
2025-12-04T08:57:43.6758539Z  * [new branch]              gh/desertfire/610/head      -> origin/gh/desertfire/610/head
2025-12-04T08:57:43.6759690Z  * [new branch]              gh/desertfire/610/orig      -> origin/gh/desertfire/610/orig
2025-12-04T08:57:43.6761075Z  * [new branch]              gh/desertfire/611/base      -> origin/gh/desertfire/611/base
2025-12-04T08:57:43.6762276Z  * [new branch]              gh/desertfire/611/head      -> origin/gh/desertfire/611/head
2025-12-04T08:57:43.6763432Z  * [new branch]              gh/desertfire/611/orig      -> origin/gh/desertfire/611/orig
2025-12-04T08:57:43.6765033Z  * [new branch]              gh/desertfire/612/base      -> origin/gh/desertfire/612/base
2025-12-04T08:57:43.6766113Z  * [new branch]              gh/desertfire/612/head      -> origin/gh/desertfire/612/head
2025-12-04T08:57:43.6767233Z  * [new branch]              gh/desertfire/612/orig      -> origin/gh/desertfire/612/orig
2025-12-04T08:57:43.6768808Z  * [new branch]              gh/desertfire/613/base      -> origin/gh/desertfire/613/base
2025-12-04T08:57:43.6769909Z  * [new branch]              gh/desertfire/613/head      -> origin/gh/desertfire/613/head
2025-12-04T08:57:43.6770990Z  * [new branch]              gh/desertfire/613/orig      -> origin/gh/desertfire/613/orig
2025-12-04T08:57:43.6772570Z  * [new branch]              gh/desertfire/614/base      -> origin/gh/desertfire/614/base
2025-12-04T08:57:43.6773790Z  * [new branch]              gh/desertfire/614/head      -> origin/gh/desertfire/614/head
2025-12-04T08:57:43.6774909Z  * [new branch]              gh/desertfire/614/orig      -> origin/gh/desertfire/614/orig
2025-12-04T08:57:43.6776487Z  * [new branch]              gh/desertfire/615/base      -> origin/gh/desertfire/615/base
2025-12-04T08:57:43.6778206Z  * [new branch]              gh/desertfire/615/head      -> origin/gh/desertfire/615/head
2025-12-04T08:57:43.6779319Z  * [new branch]              gh/desertfire/615/orig      -> origin/gh/desertfire/615/orig
2025-12-04T08:57:43.6780722Z  * [new branch]              gh/desertfire/616/base      -> origin/gh/desertfire/616/base
2025-12-04T08:57:43.6781965Z  * [new branch]              gh/desertfire/616/head      -> origin/gh/desertfire/616/head
2025-12-04T08:57:43.6782997Z  * [new branch]              gh/desertfire/616/orig      -> origin/gh/desertfire/616/orig
2025-12-04T08:57:43.6784356Z  * [new branch]              gh/desertfire/617/base      -> origin/gh/desertfire/617/base
2025-12-04T08:57:43.6785623Z  * [new branch]              gh/desertfire/617/head      -> origin/gh/desertfire/617/head
2025-12-04T08:57:43.6786669Z  * [new branch]              gh/desertfire/617/orig      -> origin/gh/desertfire/617/orig
2025-12-04T08:57:43.6788444Z  * [new branch]              gh/dharakk/1/base           -> origin/gh/dharakk/1/base
2025-12-04T08:57:43.6789727Z  * [new branch]              gh/dharakk/1/head           -> origin/gh/dharakk/1/head
2025-12-04T08:57:43.6791481Z  * [new branch]              gh/drisspg/170/base         -> origin/gh/drisspg/170/base
2025-12-04T08:57:43.6792576Z  * [new branch]              gh/drisspg/170/head         -> origin/gh/drisspg/170/head
2025-12-04T08:57:43.6793761Z  * [new branch]              gh/drisspg/170/orig         -> origin/gh/drisspg/170/orig
2025-12-04T08:57:43.6795199Z  * [new branch]              gh/drisspg/182/base         -> origin/gh/drisspg/182/base
2025-12-04T08:57:43.6796330Z  * [new branch]              gh/drisspg/182/head         -> origin/gh/drisspg/182/head
2025-12-04T08:57:43.6797615Z  * [new branch]              gh/drisspg/183/base         -> origin/gh/drisspg/183/base
2025-12-04T08:57:43.6798621Z  * [new branch]              gh/drisspg/183/head         -> origin/gh/drisspg/183/head
2025-12-04T08:57:43.6799908Z  * [new branch]              gh/drisspg/184/base         -> origin/gh/drisspg/184/base
2025-12-04T08:57:43.6800910Z  * [new branch]              gh/drisspg/184/head         -> origin/gh/drisspg/184/head
2025-12-04T08:57:43.6802402Z  * [new branch]              gh/drisspg/185/base         -> origin/gh/drisspg/185/base
2025-12-04T08:57:43.6803582Z  * [new branch]              gh/drisspg/185/head         -> origin/gh/drisspg/185/head
2025-12-04T08:57:43.6804942Z  * [new branch]              gh/drisspg/194/base         -> origin/gh/drisspg/194/base
2025-12-04T08:57:43.6806045Z  * [new branch]              gh/drisspg/194/head         -> origin/gh/drisspg/194/head
2025-12-04T08:57:43.6807227Z  * [new branch]              gh/drisspg/194/orig         -> origin/gh/drisspg/194/orig
2025-12-04T08:57:43.6808658Z  * [new branch]              gh/drisspg/200/base         -> origin/gh/drisspg/200/base
2025-12-04T08:57:43.6809749Z  * [new branch]              gh/drisspg/200/head         -> origin/gh/drisspg/200/head
2025-12-04T08:57:43.6810823Z  * [new branch]              gh/drisspg/200/orig         -> origin/gh/drisspg/200/orig
2025-12-04T08:57:43.6812275Z  * [new branch]              gh/drisspg/218/base         -> origin/gh/drisspg/218/base
2025-12-04T08:57:43.6813454Z  * [new branch]              gh/drisspg/218/head         -> origin/gh/drisspg/218/head
2025-12-04T08:57:43.6814462Z  * [new branch]              gh/drisspg/218/orig         -> origin/gh/drisspg/218/orig
2025-12-04T08:57:43.6815874Z  * [new branch]              gh/drisspg/219/base         -> origin/gh/drisspg/219/base
2025-12-04T08:57:43.6817324Z  * [new branch]              gh/drisspg/219/head         -> origin/gh/drisspg/219/head
2025-12-04T08:57:43.6818438Z  * [new branch]              gh/drisspg/219/orig         -> origin/gh/drisspg/219/orig
2025-12-04T08:57:43.6819973Z  * [new branch]              gh/drisspg/220/base         -> origin/gh/drisspg/220/base
2025-12-04T08:57:43.6821312Z  * [new branch]              gh/drisspg/220/head         -> origin/gh/drisspg/220/head
2025-12-04T08:57:43.6822600Z  * [new branch]              gh/drisspg/220/orig         -> origin/gh/drisspg/220/orig
2025-12-04T08:57:43.6824079Z  * [new branch]              gh/drisspg/221/base         -> origin/gh/drisspg/221/base
2025-12-04T08:57:43.6825186Z  * [new branch]              gh/drisspg/221/head         -> origin/gh/drisspg/221/head
2025-12-04T08:57:43.6826297Z  * [new branch]              gh/drisspg/221/orig         -> origin/gh/drisspg/221/orig
2025-12-04T08:57:43.6827767Z  * [new branch]              gh/drisspg/222/base         -> origin/gh/drisspg/222/base
2025-12-04T08:57:43.6828870Z  * [new branch]              gh/drisspg/222/head         -> origin/gh/drisspg/222/head
2025-12-04T08:57:43.6829999Z  * [new branch]              gh/drisspg/222/orig         -> origin/gh/drisspg/222/orig
2025-12-04T08:57:43.6831503Z  * [new branch]              gh/drisspg/223/base         -> origin/gh/drisspg/223/base
2025-12-04T08:57:43.6832622Z  * [new branch]              gh/drisspg/223/head         -> origin/gh/drisspg/223/head
2025-12-04T08:57:43.6833807Z  * [new branch]              gh/drisspg/223/orig         -> origin/gh/drisspg/223/orig
2025-12-04T08:57:43.6835294Z  * [new branch]              gh/drisspg/224/base         -> origin/gh/drisspg/224/base
2025-12-04T08:57:43.6836358Z  * [new branch]              gh/drisspg/224/head         -> origin/gh/drisspg/224/head
2025-12-04T08:57:43.6837522Z  * [new branch]              gh/drisspg/224/orig         -> origin/gh/drisspg/224/orig
2025-12-04T08:57:43.6838976Z  * [new branch]              gh/drisspg/225/base         -> origin/gh/drisspg/225/base
2025-12-04T08:57:43.6840044Z  * [new branch]              gh/drisspg/225/head         -> origin/gh/drisspg/225/head
2025-12-04T08:57:43.6841198Z  * [new branch]              gh/drisspg/225/orig         -> origin/gh/drisspg/225/orig
2025-12-04T08:57:43.6842609Z  * [new branch]              gh/drisspg/226/base         -> origin/gh/drisspg/226/base
2025-12-04T08:57:43.6843662Z  * [new branch]              gh/drisspg/226/head         -> origin/gh/drisspg/226/head
2025-12-04T08:57:43.6844753Z  * [new branch]              gh/drisspg/226/orig         -> origin/gh/drisspg/226/orig
2025-12-04T08:57:43.6846631Z  * [new branch]              gh/drisspg/227/base         -> origin/gh/drisspg/227/base
2025-12-04T08:57:43.6847723Z  * [new branch]              gh/drisspg/227/head         -> origin/gh/drisspg/227/head
2025-12-04T08:57:43.6848792Z  * [new branch]              gh/drisspg/227/orig         -> origin/gh/drisspg/227/orig
2025-12-04T08:57:43.6850265Z  * [new branch]              gh/drisspg/228/base         -> origin/gh/drisspg/228/base
2025-12-04T08:57:43.6851369Z  * [new branch]              gh/drisspg/228/head         -> origin/gh/drisspg/228/head
2025-12-04T08:57:43.6852521Z  * [new branch]              gh/drisspg/228/orig         -> origin/gh/drisspg/228/orig
2025-12-04T08:57:43.6854007Z  * [new branch]              gh/drisspg/229/base         -> origin/gh/drisspg/229/base
2025-12-04T08:57:43.6855056Z  * [new branch]              gh/drisspg/229/head         -> origin/gh/drisspg/229/head
2025-12-04T08:57:43.6856154Z  * [new branch]              gh/drisspg/229/orig         -> origin/gh/drisspg/229/orig
2025-12-04T08:57:43.6858063Z  * [new branch]              gh/drisspg/230/base         -> origin/gh/drisspg/230/base
2025-12-04T08:57:43.6859013Z  * [new branch]              gh/drisspg/230/head         -> origin/gh/drisspg/230/head
2025-12-04T08:57:43.6860282Z  * [new branch]              gh/drisspg/230/orig         -> origin/gh/drisspg/230/orig
2025-12-04T08:57:43.6862076Z  * [new branch]              gh/dsjohns2/1/base          -> origin/gh/dsjohns2/1/base
2025-12-04T08:57:43.6863244Z  * [new branch]              gh/dsjohns2/1/head          -> origin/gh/dsjohns2/1/head
2025-12-04T08:57:43.6865079Z  * [new branch]              gh/dzmitry-huba/1/base      -> origin/gh/dzmitry-huba/1/base
2025-12-04T08:57:43.6866341Z  * [new branch]              gh/dzmitry-huba/1/head      -> origin/gh/dzmitry-huba/1/head
2025-12-04T08:57:43.6867894Z  * [new branch]              gh/dzmitry-huba/12/base     -> origin/gh/dzmitry-huba/12/base
2025-12-04T08:57:43.6869341Z  * [new branch]              gh/dzmitry-huba/12/head     -> origin/gh/dzmitry-huba/12/head
2025-12-04T08:57:43.6870497Z  * [new branch]              gh/dzmitry-huba/12/orig     -> origin/gh/dzmitry-huba/12/orig
2025-12-04T08:57:43.6872107Z  * [new branch]              gh/dzmitry-huba/13/base     -> origin/gh/dzmitry-huba/13/base
2025-12-04T08:57:43.6873284Z  * [new branch]              gh/dzmitry-huba/13/head     -> origin/gh/dzmitry-huba/13/head
2025-12-04T08:57:43.6874367Z  * [new branch]              gh/dzmitry-huba/13/orig     -> origin/gh/dzmitry-huba/13/orig
2025-12-04T08:57:43.6875838Z  * [new branch]              gh/dzmitry-huba/14/base     -> origin/gh/dzmitry-huba/14/base
2025-12-04T08:57:43.6876926Z  * [new branch]              gh/dzmitry-huba/14/head     -> origin/gh/dzmitry-huba/14/head
2025-12-04T08:57:43.6878031Z  * [new branch]              gh/dzmitry-huba/14/orig     -> origin/gh/dzmitry-huba/14/orig
2025-12-04T08:57:43.6879618Z  * [new branch]              gh/dzmitry-huba/15/base     -> origin/gh/dzmitry-huba/15/base
2025-12-04T08:57:43.6880689Z  * [new branch]              gh/dzmitry-huba/15/head     -> origin/gh/dzmitry-huba/15/head
2025-12-04T08:57:43.6881875Z  * [new branch]              gh/dzmitry-huba/15/orig     -> origin/gh/dzmitry-huba/15/orig
2025-12-04T08:57:43.6883495Z  * [new branch]              gh/dzmitry-huba/16/base     -> origin/gh/dzmitry-huba/16/base
2025-12-04T08:57:43.6884854Z  * [new branch]              gh/dzmitry-huba/16/head     -> origin/gh/dzmitry-huba/16/head
2025-12-04T08:57:43.6886014Z  * [new branch]              gh/dzmitry-huba/16/orig     -> origin/gh/dzmitry-huba/16/orig
2025-12-04T08:57:43.6887506Z  * [new branch]              gh/dzmitry-huba/17/base     -> origin/gh/dzmitry-huba/17/base
2025-12-04T08:57:43.6888629Z  * [new branch]              gh/dzmitry-huba/17/head     -> origin/gh/dzmitry-huba/17/head
2025-12-04T08:57:43.6889716Z  * [new branch]              gh/dzmitry-huba/17/orig     -> origin/gh/dzmitry-huba/17/orig
2025-12-04T08:57:43.6891034Z  * [new branch]              gh/dzmitry-huba/2/base      -> origin/gh/dzmitry-huba/2/base
2025-12-04T08:57:43.6892062Z  * [new branch]              gh/dzmitry-huba/2/head      -> origin/gh/dzmitry-huba/2/head
2025-12-04T08:57:43.6893370Z  * [new branch]              gh/dzmitry-huba/3/base      -> origin/gh/dzmitry-huba/3/base
2025-12-04T08:57:43.6894370Z  * [new branch]              gh/dzmitry-huba/3/head      -> origin/gh/dzmitry-huba/3/head
2025-12-04T08:57:43.6896429Z  * [new branch]              gh/eellison/808/base        -> origin/gh/eellison/808/base
2025-12-04T08:57:43.6898001Z  * [new branch]              gh/eellison/808/head        -> origin/gh/eellison/808/head
2025-12-04T08:57:43.6899096Z  * [new branch]              gh/eellison/808/orig        -> origin/gh/eellison/808/orig
2025-12-04T08:57:43.6900930Z  * [new branch]              gh/eellison/822/base        -> origin/gh/eellison/822/base
2025-12-04T08:57:43.6902099Z  * [new branch]              gh/eellison/822/head        -> origin/gh/eellison/822/head
2025-12-04T08:57:43.6903294Z  * [new branch]              gh/eellison/822/orig        -> origin/gh/eellison/822/orig
2025-12-04T08:57:43.6904741Z  * [new branch]              gh/eellison/823/base        -> origin/gh/eellison/823/base
2025-12-04T08:57:43.6905876Z  * [new branch]              gh/eellison/823/head        -> origin/gh/eellison/823/head
2025-12-04T08:57:43.6906994Z  * [new branch]              gh/eellison/823/orig        -> origin/gh/eellison/823/orig
2025-12-04T08:57:43.6908458Z  * [new branch]              gh/eellison/862/base        -> origin/gh/eellison/862/base
2025-12-04T08:57:43.6909693Z  * [new branch]              gh/eellison/862/head        -> origin/gh/eellison/862/head
2025-12-04T08:57:43.6910748Z  * [new branch]              gh/eellison/862/orig        -> origin/gh/eellison/862/orig
2025-12-04T08:57:43.6912297Z  * [new branch]              gh/eellison/863/base        -> origin/gh/eellison/863/base
2025-12-04T08:57:43.6913362Z  * [new branch]              gh/eellison/863/head        -> origin/gh/eellison/863/head
2025-12-04T08:57:43.6914445Z  * [new branch]              gh/eellison/863/orig        -> origin/gh/eellison/863/orig
2025-12-04T08:57:43.6915796Z  * [new branch]              gh/eellison/864/base        -> origin/gh/eellison/864/base
2025-12-04T08:57:43.6916914Z  * [new branch]              gh/eellison/864/head        -> origin/gh/eellison/864/head
2025-12-04T08:57:43.6918105Z  * [new branch]              gh/eellison/864/orig        -> origin/gh/eellison/864/orig
2025-12-04T08:57:43.6919533Z  * [new branch]              gh/eellison/865/base        -> origin/gh/eellison/865/base
2025-12-04T08:57:43.6920594Z  * [new branch]              gh/eellison/865/head        -> origin/gh/eellison/865/head
2025-12-04T08:57:43.6922307Z  * [new branch]              gh/eellison/865/orig        -> origin/gh/eellison/865/orig
2025-12-04T08:57:43.6923857Z  * [new branch]              gh/eellison/866/base        -> origin/gh/eellison/866/base
2025-12-04T08:57:43.6924864Z  * [new branch]              gh/eellison/866/head        -> origin/gh/eellison/866/head
2025-12-04T08:57:43.6925986Z  * [new branch]              gh/eellison/866/orig        -> origin/gh/eellison/866/orig
2025-12-04T08:57:43.6927685Z  * [new branch]              gh/eellison/867/base        -> origin/gh/eellison/867/base
2025-12-04T08:57:43.6928986Z  * [new branch]              gh/eellison/867/head        -> origin/gh/eellison/867/head
2025-12-04T08:57:43.6929980Z  * [new branch]              gh/eellison/867/orig        -> origin/gh/eellison/867/orig
2025-12-04T08:57:43.6931735Z  * [new branch]              gh/eellison/868/base        -> origin/gh/eellison/868/base
2025-12-04T08:57:43.6933199Z  * [new branch]              gh/eellison/868/head        -> origin/gh/eellison/868/head
2025-12-04T08:57:43.6934438Z  * [new branch]              gh/eellison/868/orig        -> origin/gh/eellison/868/orig
2025-12-04T08:57:43.6936409Z  * [new branch]              gh/eellison/869/base        -> origin/gh/eellison/869/base
2025-12-04T08:57:43.6937826Z  * [new branch]              gh/eellison/869/head        -> origin/gh/eellison/869/head
2025-12-04T08:57:43.6938942Z  * [new branch]              gh/eellison/869/orig        -> origin/gh/eellison/869/orig
2025-12-04T08:57:43.6940488Z  * [new branch]              gh/eellison/870/base        -> origin/gh/eellison/870/base
2025-12-04T08:57:43.6941577Z  * [new branch]              gh/eellison/870/head        -> origin/gh/eellison/870/head
2025-12-04T08:57:43.6942697Z  * [new branch]              gh/eellison/870/orig        -> origin/gh/eellison/870/orig
2025-12-04T08:57:43.6944395Z  * [new branch]              gh/eellison/871/base        -> origin/gh/eellison/871/base
2025-12-04T08:57:43.6945451Z  * [new branch]              gh/eellison/871/head        -> origin/gh/eellison/871/head
2025-12-04T08:57:43.6946628Z  * [new branch]              gh/eellison/871/orig        -> origin/gh/eellison/871/orig
2025-12-04T08:57:43.6948144Z  * [new branch]              gh/eellison/872/base        -> origin/gh/eellison/872/base
2025-12-04T08:57:43.6949501Z  * [new branch]              gh/eellison/872/head        -> origin/gh/eellison/872/head
2025-12-04T08:57:43.6950513Z  * [new branch]              gh/eellison/872/orig        -> origin/gh/eellison/872/orig
2025-12-04T08:57:43.6952287Z  * [new branch]              gh/eellison/873/base        -> origin/gh/eellison/873/base
2025-12-04T08:57:43.6953342Z  * [new branch]              gh/eellison/873/head        -> origin/gh/eellison/873/head
2025-12-04T08:57:43.6954435Z  * [new branch]              gh/eellison/873/orig        -> origin/gh/eellison/873/orig
2025-12-04T08:57:43.6955886Z  * [new branch]              gh/eellison/874/base        -> origin/gh/eellison/874/base
2025-12-04T08:57:43.6957031Z  * [new branch]              gh/eellison/874/head        -> origin/gh/eellison/874/head
2025-12-04T08:57:43.6958122Z  * [new branch]              gh/eellison/874/orig        -> origin/gh/eellison/874/orig
2025-12-04T08:57:43.6960147Z  * [new branch]              gh/eellison/875/base        -> origin/gh/eellison/875/base
2025-12-04T08:57:43.6961407Z  * [new branch]              gh/eellison/875/head        -> origin/gh/eellison/875/head
2025-12-04T08:57:43.6962488Z  * [new branch]              gh/eellison/875/orig        -> origin/gh/eellison/875/orig
2025-12-04T08:57:43.6964083Z  * [new branch]              gh/eellison/876/base        -> origin/gh/eellison/876/base
2025-12-04T08:57:43.6965177Z  * [new branch]              gh/eellison/876/head        -> origin/gh/eellison/876/head
2025-12-04T08:57:43.6966344Z  * [new branch]              gh/eellison/876/orig        -> origin/gh/eellison/876/orig
2025-12-04T08:57:43.6967841Z  * [new branch]              gh/eellison/877/base        -> origin/gh/eellison/877/base
2025-12-04T08:57:43.6968941Z  * [new branch]              gh/eellison/877/head        -> origin/gh/eellison/877/head
2025-12-04T08:57:43.6970006Z  * [new branch]              gh/eellison/877/orig        -> origin/gh/eellison/877/orig
2025-12-04T08:57:43.6971931Z  * [new branch]              gh/eellison/878/base        -> origin/gh/eellison/878/base
2025-12-04T08:57:43.6972573Z  * [new branch]              gh/eellison/878/head        -> origin/gh/eellison/878/head
2025-12-04T08:57:43.6973691Z  * [new branch]              gh/eellison/878/orig        -> origin/gh/eellison/878/orig
2025-12-04T08:57:43.6975297Z  * [new branch]              gh/eellison/879/base        -> origin/gh/eellison/879/base
2025-12-04T08:57:43.6976496Z  * [new branch]              gh/eellison/879/head        -> origin/gh/eellison/879/head
2025-12-04T08:57:43.6977922Z  * [new branch]              gh/eellison/879/orig        -> origin/gh/eellison/879/orig
2025-12-04T08:57:43.6979326Z  * [new branch]              gh/eellison/880/base        -> origin/gh/eellison/880/base
2025-12-04T08:57:43.6980492Z  * [new branch]              gh/eellison/880/head        -> origin/gh/eellison/880/head
2025-12-04T08:57:43.6981660Z  * [new branch]              gh/eellison/880/orig        -> origin/gh/eellison/880/orig
2025-12-04T08:57:43.6983206Z  * [new branch]              gh/eellison/881/base        -> origin/gh/eellison/881/base
2025-12-04T08:57:43.6984339Z  * [new branch]              gh/eellison/881/head        -> origin/gh/eellison/881/head
2025-12-04T08:57:43.6985512Z  * [new branch]              gh/eellison/881/orig        -> origin/gh/eellison/881/orig
2025-12-04T08:57:43.6986990Z  * [new branch]              gh/eellison/882/base        -> origin/gh/eellison/882/base
2025-12-04T08:57:43.6988097Z  * [new branch]              gh/eellison/882/head        -> origin/gh/eellison/882/head
2025-12-04T08:57:43.6989481Z  * [new branch]              gh/eellison/882/orig        -> origin/gh/eellison/882/orig
2025-12-04T08:57:43.6990964Z  * [new branch]              gh/eellison/883/base        -> origin/gh/eellison/883/base
2025-12-04T08:57:43.6992048Z  * [new branch]              gh/eellison/883/head        -> origin/gh/eellison/883/head
2025-12-04T08:57:43.6993151Z  * [new branch]              gh/eellison/883/orig        -> origin/gh/eellison/883/orig
2025-12-04T08:57:43.6994540Z  * [new branch]              gh/eellison/884/base        -> origin/gh/eellison/884/base
2025-12-04T08:57:43.6995631Z  * [new branch]              gh/eellison/884/head        -> origin/gh/eellison/884/head
2025-12-04T08:57:43.6996646Z  * [new branch]              gh/eellison/884/orig        -> origin/gh/eellison/884/orig
2025-12-04T08:57:43.6998389Z  * [new branch]              gh/etaf/147/base            -> origin/gh/etaf/147/base
2025-12-04T08:57:43.6999509Z  * [new branch]              gh/etaf/147/head            -> origin/gh/etaf/147/head
2025-12-04T08:57:43.7001155Z  * [new branch]              gh/etaf/154/base            -> origin/gh/etaf/154/base
2025-12-04T08:57:43.7002320Z  * [new branch]              gh/etaf/154/head            -> origin/gh/etaf/154/head
2025-12-04T08:57:43.7003423Z  * [new branch]              gh/etaf/154/orig            -> origin/gh/etaf/154/orig
2025-12-04T08:57:43.7004933Z  * [new branch]              gh/etaf/156/base            -> origin/gh/etaf/156/base
2025-12-04T08:57:43.7006043Z  * [new branch]              gh/etaf/156/head            -> origin/gh/etaf/156/head
2025-12-04T08:57:43.7007180Z  * [new branch]              gh/etaf/156/orig            -> origin/gh/etaf/156/orig
2025-12-04T08:57:43.7008799Z  * [new branch]              gh/etaf/157/base            -> origin/gh/etaf/157/base
2025-12-04T08:57:43.7009976Z  * [new branch]              gh/etaf/157/head            -> origin/gh/etaf/157/head
2025-12-04T08:57:43.7011136Z  * [new branch]              gh/etaf/157/orig            -> origin/gh/etaf/157/orig
2025-12-04T08:57:43.7012509Z  * [new branch]              gh/etaf/158/base            -> origin/gh/etaf/158/base
2025-12-04T08:57:43.7013730Z  * [new branch]              gh/etaf/158/head            -> origin/gh/etaf/158/head
2025-12-04T08:57:43.7014849Z  * [new branch]              gh/etaf/158/orig            -> origin/gh/etaf/158/orig
2025-12-04T08:57:43.7016446Z  * [new branch]              gh/etaf/159/base            -> origin/gh/etaf/159/base
2025-12-04T08:57:43.7018035Z  * [new branch]              gh/etaf/159/head            -> origin/gh/etaf/159/head
2025-12-04T08:57:43.7019159Z  * [new branch]              gh/etaf/159/orig            -> origin/gh/etaf/159/orig
2025-12-04T08:57:43.7021067Z  * [new branch]              gh/etaf/160/base            -> origin/gh/etaf/160/base
2025-12-04T08:57:43.7024313Z  * [new branch]              gh/etaf/160/head            -> origin/gh/etaf/160/head
2025-12-04T08:57:43.7025532Z  * [new branch]              gh/etaf/160/orig            -> origin/gh/etaf/160/orig
2025-12-04T08:57:43.7027132Z  * [new branch]              gh/etaf/161/base            -> origin/gh/etaf/161/base
2025-12-04T08:57:43.7028330Z  * [new branch]              gh/etaf/161/head            -> origin/gh/etaf/161/head
2025-12-04T08:57:43.7029462Z  * [new branch]              gh/etaf/161/orig            -> origin/gh/etaf/161/orig
2025-12-04T08:57:43.7030985Z  * [new branch]              gh/etaf/166/base            -> origin/gh/etaf/166/base
2025-12-04T08:57:43.7032285Z  * [new branch]              gh/etaf/166/head            -> origin/gh/etaf/166/head
2025-12-04T08:57:43.7033497Z  * [new branch]              gh/etaf/166/orig            -> origin/gh/etaf/166/orig
2025-12-04T08:57:43.7034898Z  * [new branch]              gh/etaf/167/base            -> origin/gh/etaf/167/base
2025-12-04T08:57:43.7036040Z  * [new branch]              gh/etaf/167/head            -> origin/gh/etaf/167/head
2025-12-04T08:57:43.7037097Z  * [new branch]              gh/etaf/167/orig            -> origin/gh/etaf/167/orig
2025-12-04T08:57:43.7038796Z  * [new branch]              gh/etaf/168/base            -> origin/gh/etaf/168/base
2025-12-04T08:57:43.7039945Z  * [new branch]              gh/etaf/168/head            -> origin/gh/etaf/168/head
2025-12-04T08:57:43.7041039Z  * [new branch]              gh/etaf/168/orig            -> origin/gh/etaf/168/orig
2025-12-04T08:57:43.7042547Z  * [new branch]              gh/etaf/172/base            -> origin/gh/etaf/172/base
2025-12-04T08:57:43.7043698Z  * [new branch]              gh/etaf/172/head            -> origin/gh/etaf/172/head
2025-12-04T08:57:43.7044992Z  * [new branch]              gh/etaf/172/orig            -> origin/gh/etaf/172/orig
2025-12-04T08:57:43.7046581Z  * [new branch]              gh/etaf/173/base            -> origin/gh/etaf/173/base
2025-12-04T08:57:43.7047875Z  * [new branch]              gh/etaf/173/head            -> origin/gh/etaf/173/head
2025-12-04T08:57:43.7048933Z  * [new branch]              gh/etaf/173/orig            -> origin/gh/etaf/173/orig
2025-12-04T08:57:43.7050473Z  * [new branch]              gh/etaf/174/base            -> origin/gh/etaf/174/base
2025-12-04T08:57:43.7051546Z  * [new branch]              gh/etaf/174/head            -> origin/gh/etaf/174/head
2025-12-04T08:57:43.7053120Z  * [new branch]              gh/etaf/175/base            -> origin/gh/etaf/175/base
2025-12-04T08:57:43.7054194Z  * [new branch]              gh/etaf/175/head            -> origin/gh/etaf/175/head
2025-12-04T08:57:43.7055214Z  * [new branch]              gh/etaf/175/orig            -> origin/gh/etaf/175/orig
2025-12-04T08:57:43.7056955Z  * [new branch]              gh/etaf/176/base            -> origin/gh/etaf/176/base
2025-12-04T08:57:43.7058252Z  * [new branch]              gh/etaf/176/head            -> origin/gh/etaf/176/head
2025-12-04T08:57:43.7059411Z  * [new branch]              gh/etaf/176/orig            -> origin/gh/etaf/176/orig
2025-12-04T08:57:43.7061806Z  * [new branch]              gh/etaf/177/base            -> origin/gh/etaf/177/base
2025-12-04T08:57:43.7063097Z  * [new branch]              gh/etaf/177/head            -> origin/gh/etaf/177/head
2025-12-04T08:57:43.7064285Z  * [new branch]              gh/etaf/177/orig            -> origin/gh/etaf/177/orig
2025-12-04T08:57:43.7065948Z  * [new branch]              gh/etaf/178/base            -> origin/gh/etaf/178/base
2025-12-04T08:57:43.7067249Z  * [new branch]              gh/etaf/178/head            -> origin/gh/etaf/178/head
2025-12-04T08:57:43.7068432Z  * [new branch]              gh/etaf/178/orig            -> origin/gh/etaf/178/orig
2025-12-04T08:57:43.7070151Z  * [new branch]              gh/etaf/179/base            -> origin/gh/etaf/179/base
2025-12-04T08:57:43.7071235Z  * [new branch]              gh/etaf/179/head            -> origin/gh/etaf/179/head
2025-12-04T08:57:43.7072336Z  * [new branch]              gh/etaf/179/orig            -> origin/gh/etaf/179/orig
2025-12-04T08:57:43.7073688Z  * [new branch]              gh/etaf/180/base            -> origin/gh/etaf/180/base
2025-12-04T08:57:43.7074756Z  * [new branch]              gh/etaf/180/head            -> origin/gh/etaf/180/head
2025-12-04T08:57:43.7075872Z  * [new branch]              gh/etaf/180/orig            -> origin/gh/etaf/180/orig
2025-12-04T08:57:43.7077673Z  * [new branch]              gh/exclamaforte/1/base      -> origin/gh/exclamaforte/1/base
2025-12-04T08:57:43.7078751Z  * [new branch]              gh/exclamaforte/1/head      -> origin/gh/exclamaforte/1/head
2025-12-04T08:57:43.7080144Z  * [new branch]              gh/exclamaforte/2/base      -> origin/gh/exclamaforte/2/base
2025-12-04T08:57:43.7081172Z  * [new branch]              gh/exclamaforte/2/head      -> origin/gh/exclamaforte/2/head
2025-12-04T08:57:43.7082629Z  * [new branch]              gh/exclamaforte/3/base      -> origin/gh/exclamaforte/3/base
2025-12-04T08:57:43.7083818Z  * [new branch]              gh/exclamaforte/3/head      -> origin/gh/exclamaforte/3/head
2025-12-04T08:57:43.7085417Z  * [new branch]              gh/exclamaforte/4/base      -> origin/gh/exclamaforte/4/base
2025-12-04T08:57:43.7086396Z  * [new branch]              gh/exclamaforte/4/head      -> origin/gh/exclamaforte/4/head
2025-12-04T08:57:43.7088356Z  * [new branch]              gh/ezyang/2374/base         -> origin/gh/ezyang/2374/base
2025-12-04T08:57:43.7089467Z  * [new branch]              gh/ezyang/2374/head         -> origin/gh/ezyang/2374/head
2025-12-04T08:57:43.7090566Z  * [new branch]              gh/ezyang/2374/orig         -> origin/gh/ezyang/2374/orig
2025-12-04T08:57:43.7092091Z  * [new branch]              gh/ezyang/2973/base         -> origin/gh/ezyang/2973/base
2025-12-04T08:57:43.7093098Z  * [new branch]              gh/ezyang/2973/head         -> origin/gh/ezyang/2973/head
2025-12-04T08:57:43.7094251Z  * [new branch]              gh/ezyang/2973/orig         -> origin/gh/ezyang/2973/orig
2025-12-04T08:57:43.7095673Z  * [new branch]              gh/ezyang/2974/base         -> origin/gh/ezyang/2974/base
2025-12-04T08:57:43.7097011Z  * [new branch]              gh/ezyang/2974/head         -> origin/gh/ezyang/2974/head
2025-12-04T08:57:43.7098293Z  * [new branch]              gh/ezyang/2974/orig         -> origin/gh/ezyang/2974/orig
2025-12-04T08:57:43.7099783Z  * [new branch]              gh/ezyang/3131/base         -> origin/gh/ezyang/3131/base
2025-12-04T08:57:43.7100985Z  * [new branch]              gh/ezyang/3131/head         -> origin/gh/ezyang/3131/head
2025-12-04T08:57:43.7102092Z  * [new branch]              gh/ezyang/3131/orig         -> origin/gh/ezyang/3131/orig
2025-12-04T08:57:43.7103561Z  * [new branch]              gh/ezyang/3139/base         -> origin/gh/ezyang/3139/base
2025-12-04T08:57:43.7104676Z  * [new branch]              gh/ezyang/3139/head         -> origin/gh/ezyang/3139/head
2025-12-04T08:57:43.7105807Z  * [new branch]              gh/ezyang/3139/orig         -> origin/gh/ezyang/3139/orig
2025-12-04T08:57:43.7107235Z  * [new branch]              gh/ezyang/3140/base         -> origin/gh/ezyang/3140/base
2025-12-04T08:57:43.7108372Z  * [new branch]              gh/ezyang/3140/head         -> origin/gh/ezyang/3140/head
2025-12-04T08:57:43.7109588Z  * [new branch]              gh/ezyang/3140/orig         -> origin/gh/ezyang/3140/orig
2025-12-04T08:57:43.7111005Z  * [new branch]              gh/ezyang/3143/base         -> origin/gh/ezyang/3143/base
2025-12-04T08:57:43.7112074Z  * [new branch]              gh/ezyang/3143/head         -> origin/gh/ezyang/3143/head
2025-12-04T08:57:43.7113259Z  * [new branch]              gh/ezyang/3143/orig         -> origin/gh/ezyang/3143/orig
2025-12-04T08:57:43.7115189Z  * [new branch]              gh/ezyang/3144/base         -> origin/gh/ezyang/3144/base
2025-12-04T08:57:43.7116267Z  * [new branch]              gh/ezyang/3144/head         -> origin/gh/ezyang/3144/head
2025-12-04T08:57:43.7117432Z  * [new branch]              gh/ezyang/3144/orig         -> origin/gh/ezyang/3144/orig
2025-12-04T08:57:43.7118891Z  * [new branch]              gh/ezyang/3167/base         -> origin/gh/ezyang/3167/base
2025-12-04T08:57:43.7119958Z  * [new branch]              gh/ezyang/3167/head         -> origin/gh/ezyang/3167/head
2025-12-04T08:57:43.7121360Z  * [new branch]              gh/ezyang/3167/orig         -> origin/gh/ezyang/3167/orig
2025-12-04T08:57:43.7122928Z  * [new branch]              gh/ezyang/3173/base         -> origin/gh/ezyang/3173/base
2025-12-04T08:57:43.7124041Z  * [new branch]              gh/ezyang/3173/head         -> origin/gh/ezyang/3173/head
2025-12-04T08:57:43.7125236Z  * [new branch]              gh/ezyang/3173/orig         -> origin/gh/ezyang/3173/orig
2025-12-04T08:57:43.7126743Z  * [new branch]              gh/ezyang/3175/base         -> origin/gh/ezyang/3175/base
2025-12-04T08:57:43.7127873Z  * [new branch]              gh/ezyang/3175/head         -> origin/gh/ezyang/3175/head
2025-12-04T08:57:43.7129087Z  * [new branch]              gh/ezyang/3175/orig         -> origin/gh/ezyang/3175/orig
2025-12-04T08:57:43.7130601Z  * [new branch]              gh/ezyang/3182/base         -> origin/gh/ezyang/3182/base
2025-12-04T08:57:43.7131692Z  * [new branch]              gh/ezyang/3182/head         -> origin/gh/ezyang/3182/head
2025-12-04T08:57:43.7132818Z  * [new branch]              gh/ezyang/3182/orig         -> origin/gh/ezyang/3182/orig
2025-12-04T08:57:43.7134407Z  * [new branch]              gh/ezyang/3185/base         -> origin/gh/ezyang/3185/base
2025-12-04T08:57:43.7135495Z  * [new branch]              gh/ezyang/3185/head         -> origin/gh/ezyang/3185/head
2025-12-04T08:57:43.7136617Z  * [new branch]              gh/ezyang/3185/orig         -> origin/gh/ezyang/3185/orig
2025-12-04T08:57:43.7138436Z  * [new branch]              gh/ezyang/3189/base         -> origin/gh/ezyang/3189/base
2025-12-04T08:57:43.7139513Z  * [new branch]              gh/ezyang/3189/head         -> origin/gh/ezyang/3189/head
2025-12-04T08:57:43.7140576Z  * [new branch]              gh/ezyang/3189/orig         -> origin/gh/ezyang/3189/orig
2025-12-04T08:57:43.7142074Z  * [new branch]              gh/ezyang/3191/base         -> origin/gh/ezyang/3191/base
2025-12-04T08:57:43.7143175Z  * [new branch]              gh/ezyang/3191/head         -> origin/gh/ezyang/3191/head
2025-12-04T08:57:43.7144394Z  * [new branch]              gh/ezyang/3191/orig         -> origin/gh/ezyang/3191/orig
2025-12-04T08:57:43.7146313Z  * [new branch]              gh/ezyang/3192/base         -> origin/gh/ezyang/3192/base
2025-12-04T08:57:43.7147457Z  * [new branch]              gh/ezyang/3192/head         -> origin/gh/ezyang/3192/head
2025-12-04T08:57:43.7148711Z  * [new branch]              gh/ezyang/3192/orig         -> origin/gh/ezyang/3192/orig
2025-12-04T08:57:43.7150324Z  * [new branch]              gh/ezyang/3193/base         -> origin/gh/ezyang/3193/base
2025-12-04T08:57:43.7151429Z  * [new branch]              gh/ezyang/3193/head         -> origin/gh/ezyang/3193/head
2025-12-04T08:57:43.7152568Z  * [new branch]              gh/ezyang/3193/orig         -> origin/gh/ezyang/3193/orig
2025-12-04T08:57:43.7154173Z  * [new branch]              gh/ezyang/3194/base         -> origin/gh/ezyang/3194/base
2025-12-04T08:57:43.7155259Z  * [new branch]              gh/ezyang/3194/head         -> origin/gh/ezyang/3194/head
2025-12-04T08:57:43.7156340Z  * [new branch]              gh/ezyang/3194/orig         -> origin/gh/ezyang/3194/orig
2025-12-04T08:57:43.7157801Z  * [new branch]              gh/ezyang/3195/base         -> origin/gh/ezyang/3195/base
2025-12-04T08:57:43.7158833Z  * [new branch]              gh/ezyang/3195/head         -> origin/gh/ezyang/3195/head
2025-12-04T08:57:43.7160024Z  * [new branch]              gh/ezyang/3195/orig         -> origin/gh/ezyang/3195/orig
2025-12-04T08:57:43.7161468Z  * [new branch]              gh/ezyang/3196/base         -> origin/gh/ezyang/3196/base
2025-12-04T08:57:43.7163030Z  * [new branch]              gh/ezyang/3196/head         -> origin/gh/ezyang/3196/head
2025-12-04T08:57:43.7164156Z  * [new branch]              gh/ezyang/3196/orig         -> origin/gh/ezyang/3196/orig
2025-12-04T08:57:43.7165646Z  * [new branch]              gh/ezyang/3197/base         -> origin/gh/ezyang/3197/base
2025-12-04T08:57:43.7166750Z  * [new branch]              gh/ezyang/3197/head         -> origin/gh/ezyang/3197/head
2025-12-04T08:57:43.7167905Z  * [new branch]              gh/ezyang/3197/orig         -> origin/gh/ezyang/3197/orig
2025-12-04T08:57:43.7169826Z  * [new branch]              gh/ezyang/3198/base         -> origin/gh/ezyang/3198/base
2025-12-04T08:57:43.7170911Z  * [new branch]              gh/ezyang/3198/head         -> origin/gh/ezyang/3198/head
2025-12-04T08:57:43.7172058Z  * [new branch]              gh/ezyang/3198/orig         -> origin/gh/ezyang/3198/orig
2025-12-04T08:57:43.7173520Z  * [new branch]              gh/ezyang/3199/base         -> origin/gh/ezyang/3199/base
2025-12-04T08:57:43.7174585Z  * [new branch]              gh/ezyang/3199/head         -> origin/gh/ezyang/3199/head
2025-12-04T08:57:43.7175891Z  * [new branch]              gh/ezyang/3199/orig         -> origin/gh/ezyang/3199/orig
2025-12-04T08:57:43.7177684Z  * [new branch]              gh/ezyang/3200/base         -> origin/gh/ezyang/3200/base
2025-12-04T08:57:43.7178862Z  * [new branch]              gh/ezyang/3200/head         -> origin/gh/ezyang/3200/head
2025-12-04T08:57:43.7179989Z  * [new branch]              gh/ezyang/3200/orig         -> origin/gh/ezyang/3200/orig
2025-12-04T08:57:43.7181506Z  * [new branch]              gh/ezyang/3201/base         -> origin/gh/ezyang/3201/base
2025-12-04T08:57:43.7182609Z  * [new branch]              gh/ezyang/3201/head         -> origin/gh/ezyang/3201/head
2025-12-04T08:57:43.7183782Z  * [new branch]              gh/ezyang/3201/orig         -> origin/gh/ezyang/3201/orig
2025-12-04T08:57:43.7185268Z  * [new branch]              gh/ezyang/3202/base         -> origin/gh/ezyang/3202/base
2025-12-04T08:57:43.7186307Z  * [new branch]              gh/ezyang/3202/head         -> origin/gh/ezyang/3202/head
2025-12-04T08:57:43.7187422Z  * [new branch]              gh/ezyang/3202/orig         -> origin/gh/ezyang/3202/orig
2025-12-04T08:57:43.7189050Z  * [new branch]              gh/ezyang/3203/base         -> origin/gh/ezyang/3203/base
2025-12-04T08:57:43.7190124Z  * [new branch]              gh/ezyang/3203/head         -> origin/gh/ezyang/3203/head
2025-12-04T08:57:43.7191448Z  * [new branch]              gh/ezyang/3203/orig         -> origin/gh/ezyang/3203/orig
2025-12-04T08:57:43.7192912Z  * [new branch]              gh/ezyang/3204/base         -> origin/gh/ezyang/3204/base
2025-12-04T08:57:43.7193999Z  * [new branch]              gh/ezyang/3204/head         -> origin/gh/ezyang/3204/head
2025-12-04T08:57:43.7195088Z  * [new branch]              gh/ezyang/3204/orig         -> origin/gh/ezyang/3204/orig
2025-12-04T08:57:43.7196562Z  * [new branch]              gh/ezyang/3205/base         -> origin/gh/ezyang/3205/base
2025-12-04T08:57:43.7197668Z  * [new branch]              gh/ezyang/3205/head         -> origin/gh/ezyang/3205/head
2025-12-04T08:57:43.7198718Z  * [new branch]              gh/ezyang/3205/orig         -> origin/gh/ezyang/3205/orig
2025-12-04T08:57:43.7200210Z  * [new branch]              gh/ezyang/3206/base         -> origin/gh/ezyang/3206/base
2025-12-04T08:57:43.7201275Z  * [new branch]              gh/ezyang/3206/head         -> origin/gh/ezyang/3206/head
2025-12-04T08:57:43.7202382Z  * [new branch]              gh/ezyang/3206/orig         -> origin/gh/ezyang/3206/orig
2025-12-04T08:57:43.7203833Z  * [new branch]              gh/ezyang/3207/base         -> origin/gh/ezyang/3207/base
2025-12-04T08:57:43.7204944Z  * [new branch]              gh/ezyang/3207/head         -> origin/gh/ezyang/3207/head
2025-12-04T08:57:43.7206170Z  * [new branch]              gh/ezyang/3207/orig         -> origin/gh/ezyang/3207/orig
2025-12-04T08:57:43.7207598Z  * [new branch]              gh/ezyang/3208/base         -> origin/gh/ezyang/3208/base
2025-12-04T08:57:43.7208770Z  * [new branch]              gh/ezyang/3208/head         -> origin/gh/ezyang/3208/head
2025-12-04T08:57:43.7209869Z  * [new branch]              gh/ezyang/3208/orig         -> origin/gh/ezyang/3208/orig
2025-12-04T08:57:43.7211363Z  * [new branch]              gh/ezyang/3209/base         -> origin/gh/ezyang/3209/base
2025-12-04T08:57:43.7212413Z  * [new branch]              gh/ezyang/3209/head         -> origin/gh/ezyang/3209/head
2025-12-04T08:57:43.7213530Z  * [new branch]              gh/ezyang/3209/orig         -> origin/gh/ezyang/3209/orig
2025-12-04T08:57:43.7215272Z  * [new branch]              gh/fadara01/3/base          -> origin/gh/fadara01/3/base
2025-12-04T08:57:43.7216386Z  * [new branch]              gh/fadara01/3/head          -> origin/gh/fadara01/3/head
2025-12-04T08:57:43.7217960Z  * [new branch]              gh/fadara01/3/orig          -> origin/gh/fadara01/3/orig
2025-12-04T08:57:43.7219487Z  * [new branch]              gh/fadara01/5/base          -> origin/gh/fadara01/5/base
2025-12-04T08:57:43.7220740Z  * [new branch]              gh/fadara01/5/head          -> origin/gh/fadara01/5/head
2025-12-04T08:57:43.7222144Z  * [new branch]              gh/fadara01/5/orig          -> origin/gh/fadara01/5/orig
2025-12-04T08:57:43.7223589Z  * [new branch]              gh/fadara01/6/base          -> origin/gh/fadara01/6/base
2025-12-04T08:57:43.7224725Z  * [new branch]              gh/fadara01/6/head          -> origin/gh/fadara01/6/head
2025-12-04T08:57:43.7225844Z  * [new branch]              gh/fadara01/6/orig          -> origin/gh/fadara01/6/orig
2025-12-04T08:57:43.7227332Z  * [new branch]              gh/fadara01/7/base          -> origin/gh/fadara01/7/base
2025-12-04T08:57:43.7228559Z  * [new branch]              gh/fadara01/7/head          -> origin/gh/fadara01/7/head
2025-12-04T08:57:43.7229673Z  * [new branch]              gh/fadara01/7/orig          -> origin/gh/fadara01/7/orig
2025-12-04T08:57:43.7231186Z  * [new branch]              gh/fadara01/8/base          -> origin/gh/fadara01/8/base
2025-12-04T08:57:43.7232317Z  * [new branch]              gh/fadara01/8/head          -> origin/gh/fadara01/8/head
2025-12-04T08:57:43.7233523Z  * [new branch]              gh/fadara01/8/orig          -> origin/gh/fadara01/8/orig
2025-12-04T08:57:43.7234976Z  * [new branch]              gh/fadara01/9/base          -> origin/gh/fadara01/9/base
2025-12-04T08:57:43.7236171Z  * [new branch]              gh/fadara01/9/head          -> origin/gh/fadara01/9/head
2025-12-04T08:57:43.7237345Z  * [new branch]              gh/fadara01/9/orig          -> origin/gh/fadara01/9/orig
2025-12-04T08:57:43.7239521Z  * [new branch]              gh/fduwjj/182/base          -> origin/gh/fduwjj/182/base
2025-12-04T08:57:43.7240615Z  * [new branch]              gh/fduwjj/182/head          -> origin/gh/fduwjj/182/head
2025-12-04T08:57:43.7241677Z  * [new branch]              gh/fduwjj/182/orig          -> origin/gh/fduwjj/182/orig
2025-12-04T08:57:43.7243151Z  * [new branch]              gh/fduwjj/211/base          -> origin/gh/fduwjj/211/base
2025-12-04T08:57:43.7244289Z  * [new branch]              gh/fduwjj/211/head          -> origin/gh/fduwjj/211/head
2025-12-04T08:57:43.7245391Z  * [new branch]              gh/fduwjj/211/orig          -> origin/gh/fduwjj/211/orig
2025-12-04T08:57:43.7246858Z  * [new branch]              gh/fduwjj/212/base          -> origin/gh/fduwjj/212/base
2025-12-04T08:57:43.7247958Z  * [new branch]              gh/fduwjj/212/head          -> origin/gh/fduwjj/212/head
2025-12-04T08:57:43.7249110Z  * [new branch]              gh/fduwjj/212/orig          -> origin/gh/fduwjj/212/orig
2025-12-04T08:57:43.7250645Z  * [new branch]              gh/fduwjj/213/base          -> origin/gh/fduwjj/213/base
2025-12-04T08:57:43.7251740Z  * [new branch]              gh/fduwjj/213/head          -> origin/gh/fduwjj/213/head
2025-12-04T08:57:43.7252796Z  * [new branch]              gh/fduwjj/213/orig          -> origin/gh/fduwjj/213/orig
2025-12-04T08:57:43.7254399Z  * [new branch]              gh/fduwjj/226/base          -> origin/gh/fduwjj/226/base
2025-12-04T08:57:43.7255443Z  * [new branch]              gh/fduwjj/226/head          -> origin/gh/fduwjj/226/head
2025-12-04T08:57:43.7256522Z  * [new branch]              gh/fduwjj/226/orig          -> origin/gh/fduwjj/226/orig
2025-12-04T08:57:43.7258452Z  * [new branch]              gh/fduwjj/229/base          -> origin/gh/fduwjj/229/base
2025-12-04T08:57:43.7259485Z  * [new branch]              gh/fduwjj/229/head          -> origin/gh/fduwjj/229/head
2025-12-04T08:57:43.7260573Z  * [new branch]              gh/fduwjj/229/orig          -> origin/gh/fduwjj/229/orig
2025-12-04T08:57:43.7262098Z  * [new branch]              gh/fduwjj/233/base          -> origin/gh/fduwjj/233/base
2025-12-04T08:57:43.7263290Z  * [new branch]              gh/fduwjj/233/head          -> origin/gh/fduwjj/233/head
2025-12-04T08:57:43.7264395Z  * [new branch]              gh/fduwjj/233/orig          -> origin/gh/fduwjj/233/orig
2025-12-04T08:57:43.7265992Z  * [new branch]              gh/fduwjj/234/base          -> origin/gh/fduwjj/234/base
2025-12-04T08:57:43.7267110Z  * [new branch]              gh/fduwjj/234/head          -> origin/gh/fduwjj/234/head
2025-12-04T08:57:43.7268220Z  * [new branch]              gh/fduwjj/234/orig          -> origin/gh/fduwjj/234/orig
2025-12-04T08:57:43.7269784Z  * [new branch]              gh/fduwjj/235/base          -> origin/gh/fduwjj/235/base
2025-12-04T08:57:43.7270917Z  * [new branch]              gh/fduwjj/235/head          -> origin/gh/fduwjj/235/head
2025-12-04T08:57:43.7272043Z  * [new branch]              gh/fduwjj/235/orig          -> origin/gh/fduwjj/235/orig
2025-12-04T08:57:43.7273411Z  * [new branch]              gh/fduwjj/236/base          -> origin/gh/fduwjj/236/base
2025-12-04T08:57:43.7274609Z  * [new branch]              gh/fduwjj/236/head          -> origin/gh/fduwjj/236/head
2025-12-04T08:57:43.7275638Z  * [new branch]              gh/fduwjj/236/orig          -> origin/gh/fduwjj/236/orig
2025-12-04T08:57:43.7276927Z  * [new branch]              gh/fduwjj/237/base          -> origin/gh/fduwjj/237/base
2025-12-04T08:57:43.7278005Z  * [new branch]              gh/fduwjj/237/head          -> origin/gh/fduwjj/237/head
2025-12-04T08:57:43.7279081Z  * [new branch]              gh/fduwjj/237/orig          -> origin/gh/fduwjj/237/orig
2025-12-04T08:57:43.7280648Z  * [new branch]              gh/fduwjj/238/base          -> origin/gh/fduwjj/238/base
2025-12-04T08:57:43.7281801Z  * [new branch]              gh/fduwjj/238/head          -> origin/gh/fduwjj/238/head
2025-12-04T08:57:43.7282981Z  * [new branch]              gh/fduwjj/238/orig          -> origin/gh/fduwjj/238/orig
2025-12-04T08:57:43.7285056Z  * [new branch]              gh/fduwjj/239/base          -> origin/gh/fduwjj/239/base
2025-12-04T08:57:43.7286298Z  * [new branch]              gh/fduwjj/239/head          -> origin/gh/fduwjj/239/head
2025-12-04T08:57:43.7287392Z  * [new branch]              gh/fduwjj/239/orig          -> origin/gh/fduwjj/239/orig
2025-12-04T08:57:43.7289138Z  * [new branch]              gh/fegin/332/base           -> origin/gh/fegin/332/base
2025-12-04T08:57:43.7290239Z  * [new branch]              gh/fegin/332/head           -> origin/gh/fegin/332/head
2025-12-04T08:57:43.7291370Z  * [new branch]              gh/fegin/332/orig           -> origin/gh/fegin/332/orig
2025-12-04T08:57:43.7292836Z  * [new branch]              gh/fegin/333/base           -> origin/gh/fegin/333/base
2025-12-04T08:57:43.7293973Z  * [new branch]              gh/fegin/333/head           -> origin/gh/fegin/333/head
2025-12-04T08:57:43.7295123Z  * [new branch]              gh/fegin/333/orig           -> origin/gh/fegin/333/orig
2025-12-04T08:57:43.7296894Z  * [new branch]              gh/fegin/334/base           -> origin/gh/fegin/334/base
2025-12-04T08:57:43.7298087Z  * [new branch]              gh/fegin/334/head           -> origin/gh/fegin/334/head
2025-12-04T08:57:43.7299376Z  * [new branch]              gh/fegin/334/orig           -> origin/gh/fegin/334/orig
2025-12-04T08:57:43.7300863Z  * [new branch]              gh/fegin/335/base           -> origin/gh/fegin/335/base
2025-12-04T08:57:43.7302013Z  * [new branch]              gh/fegin/335/head           -> origin/gh/fegin/335/head
2025-12-04T08:57:43.7303148Z  * [new branch]              gh/fegin/335/orig           -> origin/gh/fegin/335/orig
2025-12-04T08:57:43.7304866Z  * [new branch]              gh/fffrog/160/base          -> origin/gh/fffrog/160/base
2025-12-04T08:57:43.7305988Z  * [new branch]              gh/fffrog/160/head          -> origin/gh/fffrog/160/head
2025-12-04T08:57:43.7307438Z  * [new branch]              gh/fffrog/177/base          -> origin/gh/fffrog/177/base
2025-12-04T08:57:43.7308673Z  * [new branch]              gh/fffrog/177/head          -> origin/gh/fffrog/177/head
2025-12-04T08:57:43.7309795Z  * [new branch]              gh/fffrog/177/orig          -> origin/gh/fffrog/177/orig
2025-12-04T08:57:43.7311308Z  * [new branch]              gh/fffrog/178/base          -> origin/gh/fffrog/178/base
2025-12-04T08:57:43.7312438Z  * [new branch]              gh/fffrog/178/head          -> origin/gh/fffrog/178/head
2025-12-04T08:57:43.7313553Z  * [new branch]              gh/fffrog/178/orig          -> origin/gh/fffrog/178/orig
2025-12-04T08:57:43.7315013Z  * [new branch]              gh/fffrog/181/base          -> origin/gh/fffrog/181/base
2025-12-04T08:57:43.7316064Z  * [new branch]              gh/fffrog/181/head          -> origin/gh/fffrog/181/head
2025-12-04T08:57:43.7317254Z  * [new branch]              gh/fffrog/181/orig          -> origin/gh/fffrog/181/orig
2025-12-04T08:57:43.7318656Z  * [new branch]              gh/fffrog/183/base          -> origin/gh/fffrog/183/base
2025-12-04T08:57:43.7319702Z  * [new branch]              gh/fffrog/183/head          -> origin/gh/fffrog/183/head
2025-12-04T08:57:43.7321001Z  * [new branch]              gh/fffrog/183/orig          -> origin/gh/fffrog/183/orig
2025-12-04T08:57:43.7323513Z  * [new branch]              gh/fxdawnn/10/base          -> origin/gh/fxdawnn/10/base
2025-12-04T08:57:43.7324604Z  * [new branch]              gh/fxdawnn/10/head          -> origin/gh/fxdawnn/10/head
2025-12-04T08:57:43.7325827Z  * [new branch]              gh/fxdawnn/10/orig          -> origin/gh/fxdawnn/10/orig
2025-12-04T08:57:43.7327498Z  * [new branch]              gh/fxdawnn/11/base          -> origin/gh/fxdawnn/11/base
2025-12-04T08:57:43.7328560Z  * [new branch]              gh/fxdawnn/11/head          -> origin/gh/fxdawnn/11/head
2025-12-04T08:57:43.7329919Z  * [new branch]              gh/fxdawnn/11/orig          -> origin/gh/fxdawnn/11/orig
2025-12-04T08:57:43.7331272Z  * [new branch]              gh/fxdawnn/12/base          -> origin/gh/fxdawnn/12/base
2025-12-04T08:57:43.7332509Z  * [new branch]              gh/fxdawnn/12/head          -> origin/gh/fxdawnn/12/head
2025-12-04T08:57:43.7334151Z  * [new branch]              gh/fxdawnn/12/orig          -> origin/gh/fxdawnn/12/orig
2025-12-04T08:57:43.7335610Z  * [new branch]              gh/fxdawnn/13/base          -> origin/gh/fxdawnn/13/base
2025-12-04T08:57:43.7337197Z  * [new branch]              gh/fxdawnn/13/head          -> origin/gh/fxdawnn/13/head
2025-12-04T08:57:43.7338657Z  * [new branch]              gh/fxdawnn/13/orig          -> origin/gh/fxdawnn/13/orig
2025-12-04T08:57:43.7340252Z  * [new branch]              gh/fxdawnn/14/base          -> origin/gh/fxdawnn/14/base
2025-12-04T08:57:43.7341345Z  * [new branch]              gh/fxdawnn/14/head          -> origin/gh/fxdawnn/14/head
2025-12-04T08:57:43.7342537Z  * [new branch]              gh/fxdawnn/14/orig          -> origin/gh/fxdawnn/14/orig
2025-12-04T08:57:43.7344002Z  * [new branch]              gh/fxdawnn/15/base          -> origin/gh/fxdawnn/15/base
2025-12-04T08:57:43.7345146Z  * [new branch]              gh/fxdawnn/15/head          -> origin/gh/fxdawnn/15/head
2025-12-04T08:57:43.7346258Z  * [new branch]              gh/fxdawnn/15/orig          -> origin/gh/fxdawnn/15/orig
2025-12-04T08:57:43.7347754Z  * [new branch]              gh/fxdawnn/6/base           -> origin/gh/fxdawnn/6/base
2025-12-04T08:57:43.7348982Z  * [new branch]              gh/fxdawnn/6/head           -> origin/gh/fxdawnn/6/head
2025-12-04T08:57:43.7350130Z  * [new branch]              gh/fxdawnn/6/orig           -> origin/gh/fxdawnn/6/orig
2025-12-04T08:57:43.7351737Z  * [new branch]              gh/fxdawnn/7/base           -> origin/gh/fxdawnn/7/base
2025-12-04T08:57:43.7352895Z  * [new branch]              gh/fxdawnn/7/head           -> origin/gh/fxdawnn/7/head
2025-12-04T08:57:43.7353958Z  * [new branch]              gh/fxdawnn/7/orig           -> origin/gh/fxdawnn/7/orig
2025-12-04T08:57:43.7355442Z  * [new branch]              gh/fxdawnn/9/base           -> origin/gh/fxdawnn/9/base
2025-12-04T08:57:43.7356489Z  * [new branch]              gh/fxdawnn/9/head           -> origin/gh/fxdawnn/9/head
2025-12-04T08:57:43.7357634Z  * [new branch]              gh/fxdawnn/9/orig           -> origin/gh/fxdawnn/9/orig
2025-12-04T08:57:43.7359378Z  * [new branch]              gh/galv/1/base              -> origin/gh/galv/1/base
2025-12-04T08:57:43.7360452Z  * [new branch]              gh/galv/1/head              -> origin/gh/galv/1/head
2025-12-04T08:57:43.7361553Z  * [new branch]              gh/galv/1/orig              -> origin/gh/galv/1/orig
2025-12-04T08:57:43.7363055Z  * [new branch]              gh/galv/2/base              -> origin/gh/galv/2/base
2025-12-04T08:57:43.7364129Z  * [new branch]              gh/galv/2/head              -> origin/gh/galv/2/head
2025-12-04T08:57:43.7365286Z  * [new branch]              gh/galv/2/orig              -> origin/gh/galv/2/orig
2025-12-04T08:57:43.7366721Z  * [new branch]              gh/galv/3/base              -> origin/gh/galv/3/base
2025-12-04T08:57:43.7367832Z  * [new branch]              gh/galv/3/head              -> origin/gh/galv/3/head
2025-12-04T08:57:43.7369191Z  * [new branch]              gh/galv/3/orig              -> origin/gh/galv/3/orig
2025-12-04T08:57:43.7371000Z  * [new branch]              gh/guangyey/134/base        -> origin/gh/guangyey/134/base
2025-12-04T08:57:43.7372116Z  * [new branch]              gh/guangyey/134/head        -> origin/gh/guangyey/134/head
2025-12-04T08:57:43.7373212Z  * [new branch]              gh/guangyey/134/orig        -> origin/gh/guangyey/134/orig
2025-12-04T08:57:43.7374693Z  * [new branch]              gh/guangyey/163/base        -> origin/gh/guangyey/163/base
2025-12-04T08:57:43.7375811Z  * [new branch]              gh/guangyey/163/head        -> origin/gh/guangyey/163/head
2025-12-04T08:57:43.7377269Z  * [new branch]              gh/guangyey/163/orig        -> origin/gh/guangyey/163/orig
2025-12-04T08:57:43.7378763Z  * [new branch]              gh/guangyey/168/base        -> origin/gh/guangyey/168/base
2025-12-04T08:57:43.7380296Z  * [new branch]              gh/guangyey/168/head        -> origin/gh/guangyey/168/head
2025-12-04T08:57:43.7381489Z  * [new branch]              gh/guangyey/168/orig        -> origin/gh/guangyey/168/orig
2025-12-04T08:57:43.7382984Z  * [new branch]              gh/guangyey/169/base        -> origin/gh/guangyey/169/base
2025-12-04T08:57:43.7384115Z  * [new branch]              gh/guangyey/169/head        -> origin/gh/guangyey/169/head
2025-12-04T08:57:43.7385255Z  * [new branch]              gh/guangyey/169/orig        -> origin/gh/guangyey/169/orig
2025-12-04T08:57:43.7386866Z  * [new branch]              gh/guangyey/170/base        -> origin/gh/guangyey/170/base
2025-12-04T08:57:43.7387989Z  * [new branch]              gh/guangyey/170/head        -> origin/gh/guangyey/170/head
2025-12-04T08:57:43.7389222Z  * [new branch]              gh/guangyey/170/orig        -> origin/gh/guangyey/170/orig
2025-12-04T08:57:43.7390684Z  * [new branch]              gh/guangyey/171/base        -> origin/gh/guangyey/171/base
2025-12-04T08:57:43.7391743Z  * [new branch]              gh/guangyey/171/head        -> origin/gh/guangyey/171/head
2025-12-04T08:57:43.7392846Z  * [new branch]              gh/guangyey/171/orig        -> origin/gh/guangyey/171/orig
2025-12-04T08:57:43.7394821Z  * [new branch]              gh/guangyey/178/base        -> origin/gh/guangyey/178/base
2025-12-04T08:57:43.7396002Z  * [new branch]              gh/guangyey/178/head        -> origin/gh/guangyey/178/head
2025-12-04T08:57:43.7397045Z  * [new branch]              gh/guangyey/178/orig        -> origin/gh/guangyey/178/orig
2025-12-04T08:57:43.7398461Z  * [new branch]              gh/guangyey/182/base        -> origin/gh/guangyey/182/base
2025-12-04T08:57:43.7399567Z  * [new branch]              gh/guangyey/182/head        -> origin/gh/guangyey/182/head
2025-12-04T08:57:43.7400641Z  * [new branch]              gh/guangyey/182/orig        -> origin/gh/guangyey/182/orig
2025-12-04T08:57:43.7402201Z  * [new branch]              gh/guangyey/183/base        -> origin/gh/guangyey/183/base
2025-12-04T08:57:43.7403309Z  * [new branch]              gh/guangyey/183/head        -> origin/gh/guangyey/183/head
2025-12-04T08:57:43.7404908Z  * [new branch]              gh/guangyey/183/orig        -> origin/gh/guangyey/183/orig
2025-12-04T08:57:43.7406399Z  * [new branch]              gh/guangyey/185/base        -> origin/gh/guangyey/185/base
2025-12-04T08:57:43.7407495Z  * [new branch]              gh/guangyey/185/head        -> origin/gh/guangyey/185/head
2025-12-04T08:57:43.7408601Z  * [new branch]              gh/guangyey/185/orig        -> origin/gh/guangyey/185/orig
2025-12-04T08:57:43.7410030Z  * [new branch]              gh/guangyey/186/base        -> origin/gh/guangyey/186/base
2025-12-04T08:57:43.7411237Z  * [new branch]              gh/guangyey/186/head        -> origin/gh/guangyey/186/head
2025-12-04T08:57:43.7412304Z  * [new branch]              gh/guangyey/186/orig        -> origin/gh/guangyey/186/orig
2025-12-04T08:57:43.7413727Z  * [new branch]              gh/guangyey/187/base        -> origin/gh/guangyey/187/base
2025-12-04T08:57:43.7414894Z  * [new branch]              gh/guangyey/187/head        -> origin/gh/guangyey/187/head
2025-12-04T08:57:43.7415923Z  * [new branch]              gh/guangyey/187/orig        -> origin/gh/guangyey/187/orig
2025-12-04T08:57:43.7417980Z  * [new branch]              gh/guangyey/188/base        -> origin/gh/guangyey/188/base
2025-12-04T08:57:43.7419565Z  * [new branch]              gh/guangyey/188/head        -> origin/gh/guangyey/188/head
2025-12-04T08:57:43.7420966Z  * [new branch]              gh/guangyey/188/orig        -> origin/gh/guangyey/188/orig
2025-12-04T08:57:43.7425120Z  * [new branch]              gh/guangyey/190/base        -> origin/gh/guangyey/190/base
2025-12-04T08:57:43.7427280Z  * [new branch]              gh/guangyey/190/head        -> origin/gh/guangyey/190/head
2025-12-04T08:57:43.7428023Z  * [new branch]              gh/guangyey/190/orig        -> origin/gh/guangyey/190/orig
2025-12-04T08:57:43.7429557Z  * [new branch]              gh/guangyey/208/base        -> origin/gh/guangyey/208/base
2025-12-04T08:57:43.7430570Z  * [new branch]              gh/guangyey/208/head        -> origin/gh/guangyey/208/head
2025-12-04T08:57:43.7431705Z  * [new branch]              gh/guangyey/208/orig        -> origin/gh/guangyey/208/orig
2025-12-04T08:57:43.7433357Z  * [new branch]              gh/guangyey/228/base        -> origin/gh/guangyey/228/base
2025-12-04T08:57:43.7434366Z  * [new branch]              gh/guangyey/228/head        -> origin/gh/guangyey/228/head
2025-12-04T08:57:43.7435483Z  * [new branch]              gh/guangyey/228/orig        -> origin/gh/guangyey/228/orig
2025-12-04T08:57:43.7437603Z  * [new branch]              gh/guangyey/230/base        -> origin/gh/guangyey/230/base
2025-12-04T08:57:43.7438570Z  * [new branch]              gh/guangyey/230/head        -> origin/gh/guangyey/230/head
2025-12-04T08:57:43.7440172Z  * [new branch]              gh/guangyey/230/orig        -> origin/gh/guangyey/230/orig
2025-12-04T08:57:43.7441656Z  * [new branch]              gh/guangyey/231/base        -> origin/gh/guangyey/231/base
2025-12-04T08:57:43.7442594Z  * [new branch]              gh/guangyey/231/head        -> origin/gh/guangyey/231/head
2025-12-04T08:57:43.7443719Z  * [new branch]              gh/guangyey/231/orig        -> origin/gh/guangyey/231/orig
2025-12-04T08:57:43.7445301Z  * [new branch]              gh/guangyey/232/base        -> origin/gh/guangyey/232/base
2025-12-04T08:57:43.7446293Z  * [new branch]              gh/guangyey/232/head        -> origin/gh/guangyey/232/head
2025-12-04T08:57:43.7447400Z  * [new branch]              gh/guangyey/232/orig        -> origin/gh/guangyey/232/orig
2025-12-04T08:57:43.7448959Z  * [new branch]              gh/guangyey/233/base        -> origin/gh/guangyey/233/base
2025-12-04T08:57:43.7449973Z  * [new branch]              gh/guangyey/233/head        -> origin/gh/guangyey/233/head
2025-12-04T08:57:43.7451052Z  * [new branch]              gh/guangyey/233/orig        -> origin/gh/guangyey/233/orig
2025-12-04T08:57:43.7452738Z  * [new branch]              gh/guangyey/234/base        -> origin/gh/guangyey/234/base
2025-12-04T08:57:43.7453732Z  * [new branch]              gh/guangyey/234/head        -> origin/gh/guangyey/234/head
2025-12-04T08:57:43.7454900Z  * [new branch]              gh/guangyey/234/orig        -> origin/gh/guangyey/234/orig
2025-12-04T08:57:43.7456547Z  * [new branch]              gh/guangyey/235/base        -> origin/gh/guangyey/235/base
2025-12-04T08:57:43.7457808Z  * [new branch]              gh/guangyey/235/head        -> origin/gh/guangyey/235/head
2025-12-04T08:57:43.7458919Z  * [new branch]              gh/guangyey/235/orig        -> origin/gh/guangyey/235/orig
2025-12-04T08:57:43.7460528Z  * [new branch]              gh/guangyey/236/base        -> origin/gh/guangyey/236/base
2025-12-04T08:57:43.7461536Z  * [new branch]              gh/guangyey/236/head        -> origin/gh/guangyey/236/head
2025-12-04T08:57:43.7462748Z  * [new branch]              gh/guangyey/236/orig        -> origin/gh/guangyey/236/orig
2025-12-04T08:57:43.7464948Z  * [new branch]              gh/guangyey/237/base        -> origin/gh/guangyey/237/base
2025-12-04T08:57:43.7465572Z  * [new branch]              gh/guangyey/237/head        -> origin/gh/guangyey/237/head
2025-12-04T08:57:43.7466590Z  * [new branch]              gh/guangyey/237/orig        -> origin/gh/guangyey/237/orig
2025-12-04T08:57:43.7468253Z  * [new branch]              gh/guangyey/238/base        -> origin/gh/guangyey/238/base
2025-12-04T08:57:43.7469397Z  * [new branch]              gh/guangyey/238/head        -> origin/gh/guangyey/238/head
2025-12-04T08:57:43.7470972Z  * [new branch]              gh/guangyey/239/base        -> origin/gh/guangyey/239/base
2025-12-04T08:57:43.7471917Z  * [new branch]              gh/guangyey/239/head        -> origin/gh/guangyey/239/head
2025-12-04T08:57:43.7473020Z  * [new branch]              gh/guangyey/239/orig        -> origin/gh/guangyey/239/orig
2025-12-04T08:57:43.7474674Z  * [new branch]              gh/guangyey/240/base        -> origin/gh/guangyey/240/base
2025-12-04T08:57:43.7475665Z  * [new branch]              gh/guangyey/240/head        -> origin/gh/guangyey/240/head
2025-12-04T08:57:43.7476730Z  * [new branch]              gh/guangyey/240/orig        -> origin/gh/guangyey/240/orig
2025-12-04T08:57:43.7478272Z  * [new branch]              gh/guangyey/241/base        -> origin/gh/guangyey/241/base
2025-12-04T08:57:43.7479250Z  * [new branch]              gh/guangyey/241/head        -> origin/gh/guangyey/241/head
2025-12-04T08:57:43.7480372Z  * [new branch]              gh/guangyey/241/orig        -> origin/gh/guangyey/241/orig
2025-12-04T08:57:43.7482374Z  * [new branch]              gh/guangyey/242/base        -> origin/gh/guangyey/242/base
2025-12-04T08:57:43.7483438Z  * [new branch]              gh/guangyey/242/head        -> origin/gh/guangyey/242/head
2025-12-04T08:57:43.7484556Z  * [new branch]              gh/guangyey/242/orig        -> origin/gh/guangyey/242/orig
2025-12-04T08:57:43.7486146Z  * [new branch]              gh/guangyey/243/base        -> origin/gh/guangyey/243/base
2025-12-04T08:57:43.7487348Z  * [new branch]              gh/guangyey/243/head        -> origin/gh/guangyey/243/head
2025-12-04T08:57:43.7488460Z  * [new branch]              gh/guangyey/243/orig        -> origin/gh/guangyey/243/orig
2025-12-04T08:57:43.7490094Z  * [new branch]              gh/guangyey/244/base        -> origin/gh/guangyey/244/base
2025-12-04T08:57:43.7491123Z  * [new branch]              gh/guangyey/244/head        -> origin/gh/guangyey/244/head
2025-12-04T08:57:43.7492278Z  * [new branch]              gh/guangyey/244/orig        -> origin/gh/guangyey/244/orig
2025-12-04T08:57:43.7494379Z  * [new branch]              gh/guangyey/245/base        -> origin/gh/guangyey/245/base
2025-12-04T08:57:43.7495429Z  * [new branch]              gh/guangyey/245/head        -> origin/gh/guangyey/245/head
2025-12-04T08:57:43.7496634Z  * [new branch]              gh/guangyey/245/orig        -> origin/gh/guangyey/245/orig
2025-12-04T08:57:43.7498530Z  * [new branch]              gh/guangyey/246/base        -> origin/gh/guangyey/246/base
2025-12-04T08:57:43.7499599Z  * [new branch]              gh/guangyey/246/head        -> origin/gh/guangyey/246/head
2025-12-04T08:57:43.7500703Z  * [new branch]              gh/guangyey/246/orig        -> origin/gh/guangyey/246/orig
2025-12-04T08:57:43.7502395Z  * [new branch]              gh/guangyey/247/base        -> origin/gh/guangyey/247/base
2025-12-04T08:57:43.7503404Z  * [new branch]              gh/guangyey/247/head        -> origin/gh/guangyey/247/head
2025-12-04T08:57:43.7504533Z  * [new branch]              gh/guangyey/247/orig        -> origin/gh/guangyey/247/orig
2025-12-04T08:57:43.7506207Z  * [new branch]              gh/guangyey/248/base        -> origin/gh/guangyey/248/base
2025-12-04T08:57:43.7507186Z  * [new branch]              gh/guangyey/248/head        -> origin/gh/guangyey/248/head
2025-12-04T08:57:43.7508307Z  * [new branch]              gh/guangyey/248/orig        -> origin/gh/guangyey/248/orig
2025-12-04T08:57:43.7510098Z  * [new branch]              gh/guangyey/249/base        -> origin/gh/guangyey/249/base
2025-12-04T08:57:43.7510988Z  * [new branch]              gh/guangyey/249/head        -> origin/gh/guangyey/249/head
2025-12-04T08:57:43.7512116Z  * [new branch]              gh/guangyey/249/orig        -> origin/gh/guangyey/249/orig
2025-12-04T08:57:43.7513683Z  * [new branch]              gh/guangyey/250/base        -> origin/gh/guangyey/250/base
2025-12-04T08:57:43.7514833Z  * [new branch]              gh/guangyey/250/head        -> origin/gh/guangyey/250/head
2025-12-04T08:57:43.7515929Z  * [new branch]              gh/guangyey/250/orig        -> origin/gh/guangyey/250/orig
2025-12-04T08:57:43.7517412Z  * [new branch]              gh/guangyey/251/base        -> origin/gh/guangyey/251/base
2025-12-04T08:57:43.7518451Z  * [new branch]              gh/guangyey/251/head        -> origin/gh/guangyey/251/head
2025-12-04T08:57:43.7519581Z  * [new branch]              gh/guangyey/251/orig        -> origin/gh/guangyey/251/orig
2025-12-04T08:57:43.7521476Z  * [new branch]              gh/guangyey/252/base        -> origin/gh/guangyey/252/base
2025-12-04T08:57:43.7522579Z  * [new branch]              gh/guangyey/252/head        -> origin/gh/guangyey/252/head
2025-12-04T08:57:43.7523737Z  * [new branch]              gh/guangyey/252/orig        -> origin/gh/guangyey/252/orig
2025-12-04T08:57:43.7525339Z  * [new branch]              gh/guangyey/253/base        -> origin/gh/guangyey/253/base
2025-12-04T08:57:43.7526377Z  * [new branch]              gh/guangyey/253/head        -> origin/gh/guangyey/253/head
2025-12-04T08:57:43.7527484Z  * [new branch]              gh/guangyey/253/orig        -> origin/gh/guangyey/253/orig
2025-12-04T08:57:43.7529102Z  * [new branch]              gh/guangyey/254/base        -> origin/gh/guangyey/254/base
2025-12-04T08:57:43.7530209Z  * [new branch]              gh/guangyey/254/head        -> origin/gh/guangyey/254/head
2025-12-04T08:57:43.7531350Z  * [new branch]              gh/guangyey/254/orig        -> origin/gh/guangyey/254/orig
2025-12-04T08:57:43.7532952Z  * [new branch]              gh/guangyey/255/base        -> origin/gh/guangyey/255/base
2025-12-04T08:57:43.7534029Z  * [new branch]              gh/guangyey/255/head        -> origin/gh/guangyey/255/head
2025-12-04T08:57:43.7535182Z  * [new branch]              gh/guangyey/255/orig        -> origin/gh/guangyey/255/orig
2025-12-04T08:57:43.7537542Z  * [new branch]              gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T08:57:43.7538619Z  * [new branch]              gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T08:57:43.7539764Z  * [new branch]              gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T08:57:43.7541048Z  * [new branch]              gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T08:57:43.7542095Z  * [new branch]              gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T08:57:43.7543350Z  * [new branch]              gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T08:57:43.7544865Z  * [new branch]              gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T08:57:43.7545967Z  * [new branch]              gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T08:57:43.7549168Z  * [new branch]              gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T08:57:43.7550655Z  * [new branch]              gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T08:57:43.7551726Z  * [new branch]              gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T08:57:43.7552854Z  * [new branch]              gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T08:57:43.7554275Z  * [new branch]              gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T08:57:43.7555662Z  * [new branch]              gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T08:57:43.7556624Z  * [new branch]              gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T08:57:43.7558121Z  * [new branch]              gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T08:57:43.7559186Z  * [new branch]              gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T08:57:43.7561218Z  * [new branch]              gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T08:57:43.7562317Z  * [new branch]              gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T08:57:43.7563240Z  * [new branch]              gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T08:57:43.7564273Z  * [new branch]              gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T08:57:43.7565893Z  * [new branch]              gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T08:57:43.7566936Z  * [new branch]              gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T08:57:43.7567872Z  * [new branch]              gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T08:57:43.7569477Z  * [new branch]              gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T08:57:43.7570416Z  * [new branch]              gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T08:57:43.7571824Z  * [new branch]              gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T08:57:43.7573339Z  * [new branch]              gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T08:57:43.7574343Z  * [new branch]              gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T08:57:43.7575280Z  * [new branch]              gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T08:57:43.7577194Z  * [new branch]              gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T08:57:43.7578362Z  * [new branch]              gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T08:57:43.7579440Z  * [new branch]              gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T08:57:43.7581050Z  * [new branch]              gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T08:57:43.7582054Z  * [new branch]              gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T08:57:43.7583193Z  * [new branch]              gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T08:57:43.7584754Z  * [new branch]              gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T08:57:43.7585761Z  * [new branch]              gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T08:57:43.7586883Z  * [new branch]              gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T08:57:43.7588392Z  * [new branch]              gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T08:57:43.7589550Z  * [new branch]              gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T08:57:43.7592796Z  * [new branch]              gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T08:57:43.7593943Z  * [new branch]              gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T08:57:43.7594475Z  * [new branch]              gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T08:57:43.7595019Z  * [new branch]              gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T08:57:43.7595848Z  * [new branch]              gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T08:57:43.7597109Z  * [new branch]              gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T08:57:43.7598104Z  * [new branch]              gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T08:57:43.7600051Z  * [new branch]              gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T08:57:43.7600994Z  * [new branch]              gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T08:57:43.7602136Z  * [new branch]              gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T08:57:43.7603765Z  * [new branch]              gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T08:57:43.7604729Z  * [new branch]              gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T08:57:43.7605824Z  * [new branch]              gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T08:57:43.7607541Z  * [new branch]              gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T08:57:43.7608474Z  * [new branch]              gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T08:57:43.7609552Z  * [new branch]              gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T08:57:43.7611179Z  * [new branch]              gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T08:57:43.7612197Z  * [new branch]              gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T08:57:43.7613304Z  * [new branch]              gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T08:57:43.7614887Z  * [new branch]              gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T08:57:43.7616072Z  * [new branch]              gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T08:57:43.7617560Z  * [new branch]              gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T08:57:43.7619413Z  * [new branch]              gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T08:57:43.7620412Z  * [new branch]              gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T08:57:43.7621859Z  * [new branch]              gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T08:57:43.7623834Z  * [new branch]              gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T08:57:43.7624628Z  * [new branch]              gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T08:57:43.7625866Z  * [new branch]              gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T08:57:43.7627462Z  * [new branch]              gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T08:57:43.7628541Z  * [new branch]              gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T08:57:43.7629677Z  * [new branch]              gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T08:57:43.7631238Z  * [new branch]              gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T08:57:43.7632262Z  * [new branch]              gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T08:57:43.7634098Z  * [new branch]              gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T08:57:43.7635611Z  * [new branch]              gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T08:57:43.7636734Z  * [new branch]              gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T08:57:43.7637791Z  * [new branch]              gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T08:57:43.7639474Z  * [new branch]              gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T08:57:43.7640715Z  * [new branch]              gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T08:57:43.7641597Z  * [new branch]              gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T08:57:43.7643167Z  * [new branch]              gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T08:57:43.7644246Z  * [new branch]              gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T08:57:43.7645265Z  * [new branch]              gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T08:57:43.7646902Z  * [new branch]              gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T08:57:43.7647884Z  * [new branch]              gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T08:57:43.7648970Z  * [new branch]              gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T08:57:43.7650621Z  * [new branch]              gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T08:57:43.7651553Z  * [new branch]              gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T08:57:43.7652673Z  * [new branch]              gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T08:57:43.7654420Z  * [new branch]              gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T08:57:43.7655784Z  * [new branch]              gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T08:57:43.7656817Z  * [new branch]              gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T08:57:43.7658951Z  * [new branch]              gh/hameerabbasi/1/base      -> origin/gh/hameerabbasi/1/base
2025-12-04T08:57:43.7659877Z  * [new branch]              gh/hameerabbasi/1/head      -> origin/gh/hameerabbasi/1/head
2025-12-04T08:57:43.7661388Z  * [new branch]              gh/hameerabbasi/2/base      -> origin/gh/hameerabbasi/2/base
2025-12-04T08:57:43.7662369Z  * [new branch]              gh/hameerabbasi/2/head      -> origin/gh/hameerabbasi/2/head
2025-12-04T08:57:43.7663409Z  * [new branch]              gh/hameerabbasi/2/orig      -> origin/gh/hameerabbasi/2/orig
2025-12-04T08:57:43.7664829Z  * [new branch]              gh/hameerabbasi/3/base      -> origin/gh/hameerabbasi/3/base
2025-12-04T08:57:43.7665946Z  * [new branch]              gh/hameerabbasi/3/head      -> origin/gh/hameerabbasi/3/head
2025-12-04T08:57:43.7667717Z  * [new branch]              gh/hameerabbasi/3/orig      -> origin/gh/hameerabbasi/3/orig
2025-12-04T08:57:43.7669350Z  * [new branch]              gh/hameerabbasi/4/base      -> origin/gh/hameerabbasi/4/base
2025-12-04T08:57:43.7670538Z  * [new branch]              gh/hameerabbasi/4/head      -> origin/gh/hameerabbasi/4/head
2025-12-04T08:57:43.7671422Z  * [new branch]              gh/hameerabbasi/4/orig      -> origin/gh/hameerabbasi/4/orig
2025-12-04T08:57:43.7673556Z  * [new branch]              gh/huydhn/1/next            -> origin/gh/huydhn/1/next
2025-12-04T08:57:43.7674924Z  * [new branch]              gh/huydhn/2/next            -> origin/gh/huydhn/2/next
2025-12-04T08:57:43.7676329Z  * [new branch]              gh/huydhn/3/next            -> origin/gh/huydhn/3/next
2025-12-04T08:57:43.7677766Z  * [new branch]              gh/huydhn/4/next            -> origin/gh/huydhn/4/next
2025-12-04T08:57:43.7679176Z  * [new branch]              gh/huydhn/5/next            -> origin/gh/huydhn/5/next
2025-12-04T08:57:43.7680578Z  * [new branch]              gh/huydhn/6/next            -> origin/gh/huydhn/6/next
2025-12-04T08:57:43.7682411Z  * [new branch]              gh/int3/97/base             -> origin/gh/int3/97/base
2025-12-04T08:57:43.7683489Z  * [new branch]              gh/int3/97/head             -> origin/gh/int3/97/head
2025-12-04T08:57:43.7685288Z  * [new branch]              gh/isuruf/101/base          -> origin/gh/isuruf/101/base
2025-12-04T08:57:43.7686477Z  * [new branch]              gh/isuruf/101/head          -> origin/gh/isuruf/101/head
2025-12-04T08:57:43.7687969Z  * [new branch]              gh/isuruf/146/base          -> origin/gh/isuruf/146/base
2025-12-04T08:57:43.7689053Z  * [new branch]              gh/isuruf/146/head          -> origin/gh/isuruf/146/head
2025-12-04T08:57:43.7690130Z  * [new branch]              gh/isuruf/146/orig          -> origin/gh/isuruf/146/orig
2025-12-04T08:57:43.7691586Z  * [new branch]              gh/isuruf/158/base          -> origin/gh/isuruf/158/base
2025-12-04T08:57:43.7692620Z  * [new branch]              gh/isuruf/158/head          -> origin/gh/isuruf/158/head
2025-12-04T08:57:43.7694080Z  * [new branch]              gh/isuruf/159/base          -> origin/gh/isuruf/159/base
2025-12-04T08:57:43.7695169Z  * [new branch]              gh/isuruf/159/head          -> origin/gh/isuruf/159/head
2025-12-04T08:57:43.7696869Z  * [new branch]              gh/isuruf/160/base          -> origin/gh/isuruf/160/base
2025-12-04T08:57:43.7698066Z  * [new branch]              gh/isuruf/160/head          -> origin/gh/isuruf/160/head
2025-12-04T08:57:43.7699184Z  * [new branch]              gh/isuruf/160/orig          -> origin/gh/isuruf/160/orig
2025-12-04T08:57:43.7701213Z  * [new branch]              gh/isuruf/81/base           -> origin/gh/isuruf/81/base
2025-12-04T08:57:43.7702328Z  * [new branch]              gh/isuruf/81/head           -> origin/gh/isuruf/81/head
2025-12-04T08:57:43.7703433Z  * [new branch]              gh/isuruf/81/orig           -> origin/gh/isuruf/81/orig
2025-12-04T08:57:43.7705221Z  * [new branch]              gh/jamesjwu/176/base        -> origin/gh/jamesjwu/176/base
2025-12-04T08:57:43.7706366Z  * [new branch]              gh/jamesjwu/176/head        -> origin/gh/jamesjwu/176/head
2025-12-04T08:57:43.7707489Z  * [new branch]              gh/jamesjwu/176/orig        -> origin/gh/jamesjwu/176/orig
2025-12-04T08:57:43.7709154Z  * [new branch]              gh/jamesjwu/187/base        -> origin/gh/jamesjwu/187/base
2025-12-04T08:57:43.7710295Z  * [new branch]              gh/jamesjwu/187/head        -> origin/gh/jamesjwu/187/head
2025-12-04T08:57:43.7711399Z  * [new branch]              gh/jamesjwu/187/orig        -> origin/gh/jamesjwu/187/orig
2025-12-04T08:57:43.7712838Z  * [new branch]              gh/jamesjwu/196/base        -> origin/gh/jamesjwu/196/base
2025-12-04T08:57:43.7713914Z  * [new branch]              gh/jamesjwu/196/head        -> origin/gh/jamesjwu/196/head
2025-12-04T08:57:43.7714992Z  * [new branch]              gh/jamesjwu/196/orig        -> origin/gh/jamesjwu/196/orig
2025-12-04T08:57:43.7716426Z  * [new branch]              gh/jamesjwu/198/base        -> origin/gh/jamesjwu/198/base
2025-12-04T08:57:43.7717468Z  * [new branch]              gh/jamesjwu/198/head        -> origin/gh/jamesjwu/198/head
2025-12-04T08:57:43.7718559Z  * [new branch]              gh/jamesjwu/198/orig        -> origin/gh/jamesjwu/198/orig
2025-12-04T08:57:43.7719980Z  * [new branch]              gh/jamesjwu/207/base        -> origin/gh/jamesjwu/207/base
2025-12-04T08:57:43.7721675Z  * [new branch]              gh/jamesjwu/207/head        -> origin/gh/jamesjwu/207/head
2025-12-04T08:57:43.7722815Z  * [new branch]              gh/jamesjwu/207/orig        -> origin/gh/jamesjwu/207/orig
2025-12-04T08:57:43.7724562Z  * [new branch]              gh/jamesjwu/208/base        -> origin/gh/jamesjwu/208/base
2025-12-04T08:57:43.7725692Z  * [new branch]              gh/jamesjwu/208/head        -> origin/gh/jamesjwu/208/head
2025-12-04T08:57:43.7726800Z  * [new branch]              gh/jamesjwu/208/orig        -> origin/gh/jamesjwu/208/orig
2025-12-04T08:57:43.7728364Z  * [new branch]              gh/jamesjwu/52/base         -> origin/gh/jamesjwu/52/base
2025-12-04T08:57:43.7729488Z  * [new branch]              gh/jamesjwu/52/head         -> origin/gh/jamesjwu/52/head
2025-12-04T08:57:43.7730864Z  * [new branch]              gh/jamesjwu/53/base         -> origin/gh/jamesjwu/53/base
2025-12-04T08:57:43.7731919Z  * [new branch]              gh/jamesjwu/53/head         -> origin/gh/jamesjwu/53/head
2025-12-04T08:57:43.7733367Z  * [new branch]              gh/jamesjwu/54/base         -> origin/gh/jamesjwu/54/base
2025-12-04T08:57:43.7734340Z  * [new branch]              gh/jamesjwu/54/head         -> origin/gh/jamesjwu/54/head
2025-12-04T08:57:43.7735681Z  * [new branch]              gh/jamesjwu/55/base         -> origin/gh/jamesjwu/55/base
2025-12-04T08:57:43.7736949Z  * [new branch]              gh/jamesjwu/55/head         -> origin/gh/jamesjwu/55/head
2025-12-04T08:57:43.7738474Z  * [new branch]              gh/jamesjwu/56/base         -> origin/gh/jamesjwu/56/base
2025-12-04T08:57:43.7739549Z  * [new branch]              gh/jamesjwu/56/head         -> origin/gh/jamesjwu/56/head
2025-12-04T08:57:43.7740966Z  * [new branch]              gh/jamesjwu/57/base         -> origin/gh/jamesjwu/57/base
2025-12-04T08:57:43.7742023Z  * [new branch]              gh/jamesjwu/57/head         -> origin/gh/jamesjwu/57/head
2025-12-04T08:57:43.7743364Z  * [new branch]              gh/jamesjwu/58/base         -> origin/gh/jamesjwu/58/base
2025-12-04T08:57:43.7744451Z  * [new branch]              gh/jamesjwu/58/head         -> origin/gh/jamesjwu/58/head
2025-12-04T08:57:43.7745770Z  * [new branch]              gh/jamesjwu/59/base         -> origin/gh/jamesjwu/59/base
2025-12-04T08:57:43.7746803Z  * [new branch]              gh/jamesjwu/59/head         -> origin/gh/jamesjwu/59/head
2025-12-04T08:57:43.7748668Z  * [new branch]              gh/jamesjwu/60/base         -> origin/gh/jamesjwu/60/base
2025-12-04T08:57:43.7749900Z  * [new branch]              gh/jamesjwu/60/head         -> origin/gh/jamesjwu/60/head
2025-12-04T08:57:43.7751198Z  * [new branch]              gh/jamesjwu/61/base         -> origin/gh/jamesjwu/61/base
2025-12-04T08:57:43.7752354Z  * [new branch]              gh/jamesjwu/61/head         -> origin/gh/jamesjwu/61/head
2025-12-04T08:57:43.7753747Z  * [new branch]              gh/jamesjwu/62/base         -> origin/gh/jamesjwu/62/base
2025-12-04T08:57:43.7754783Z  * [new branch]              gh/jamesjwu/62/head         -> origin/gh/jamesjwu/62/head
2025-12-04T08:57:43.7756041Z  * [new branch]              gh/jamesjwu/63/base         -> origin/gh/jamesjwu/63/base
2025-12-04T08:57:43.7757124Z  * [new branch]              gh/jamesjwu/63/head         -> origin/gh/jamesjwu/63/head
2025-12-04T08:57:43.7759041Z  * [new branch]              gh/jamesjwu/64/base         -> origin/gh/jamesjwu/64/base
2025-12-04T08:57:43.7760122Z  * [new branch]              gh/jamesjwu/64/head         -> origin/gh/jamesjwu/64/head
2025-12-04T08:57:43.7761915Z  * [new branch]              gh/jamesjwu/65/base         -> origin/gh/jamesjwu/65/base
2025-12-04T08:57:43.7762926Z  * [new branch]              gh/jamesjwu/65/head         -> origin/gh/jamesjwu/65/head
2025-12-04T08:57:43.7764790Z  * [new branch]              gh/janeyx99/165/base        -> origin/gh/janeyx99/165/base
2025-12-04T08:57:43.7766017Z  * [new branch]              gh/janeyx99/165/head        -> origin/gh/janeyx99/165/head
2025-12-04T08:57:43.7767124Z  * [new branch]              gh/janeyx99/165/orig        -> origin/gh/janeyx99/165/orig
2025-12-04T08:57:43.7768442Z  * [new branch]              gh/janeyx99/201/base        -> origin/gh/janeyx99/201/base
2025-12-04T08:57:43.7769534Z  * [new branch]              gh/janeyx99/201/head        -> origin/gh/janeyx99/201/head
2025-12-04T08:57:43.7770607Z  * [new branch]              gh/janeyx99/201/orig        -> origin/gh/janeyx99/201/orig
2025-12-04T08:57:43.7772340Z  * [new branch]              gh/janeyx99/225/base        -> origin/gh/janeyx99/225/base
2025-12-04T08:57:43.7773459Z  * [new branch]              gh/janeyx99/225/head        -> origin/gh/janeyx99/225/head
2025-12-04T08:57:43.7774553Z  * [new branch]              gh/janeyx99/225/orig        -> origin/gh/janeyx99/225/orig
2025-12-04T08:57:43.7776012Z  * [new branch]              gh/janeyx99/299/base        -> origin/gh/janeyx99/299/base
2025-12-04T08:57:43.7777494Z  * [new branch]              gh/janeyx99/299/head        -> origin/gh/janeyx99/299/head
2025-12-04T08:57:43.7778716Z  * [new branch]              gh/janeyx99/299/orig        -> origin/gh/janeyx99/299/orig
2025-12-04T08:57:43.7780560Z  * [new branch]              gh/janeyx99/302/base        -> origin/gh/janeyx99/302/base
2025-12-04T08:57:43.7781840Z  * [new branch]              gh/janeyx99/302/head        -> origin/gh/janeyx99/302/head
2025-12-04T08:57:43.7783170Z  * [new branch]              gh/janeyx99/303/base        -> origin/gh/janeyx99/303/base
2025-12-04T08:57:43.7784278Z  * [new branch]              gh/janeyx99/303/head        -> origin/gh/janeyx99/303/head
2025-12-04T08:57:43.7785735Z  * [new branch]              gh/janeyx99/305/base        -> origin/gh/janeyx99/305/base
2025-12-04T08:57:43.7786888Z  * [new branch]              gh/janeyx99/305/head        -> origin/gh/janeyx99/305/head
2025-12-04T08:57:43.7788203Z  * [new branch]              gh/janeyx99/306/base        -> origin/gh/janeyx99/306/base
2025-12-04T08:57:43.7789516Z  * [new branch]              gh/janeyx99/306/head        -> origin/gh/janeyx99/306/head
2025-12-04T08:57:43.7791081Z  * [new branch]              gh/janeyx99/314/base        -> origin/gh/janeyx99/314/base
2025-12-04T08:57:43.7792138Z  * [new branch]              gh/janeyx99/314/head        -> origin/gh/janeyx99/314/head
2025-12-04T08:57:43.7793224Z  * [new branch]              gh/janeyx99/314/orig        -> origin/gh/janeyx99/314/orig
2025-12-04T08:57:43.7794853Z  * [new branch]              gh/janeyx99/315/base        -> origin/gh/janeyx99/315/base
2025-12-04T08:57:43.7795903Z  * [new branch]              gh/janeyx99/315/head        -> origin/gh/janeyx99/315/head
2025-12-04T08:57:43.7796871Z  * [new branch]              gh/janeyx99/315/orig        -> origin/gh/janeyx99/315/orig
2025-12-04T08:57:43.7798362Z  * [new branch]              gh/janeyx99/316/base        -> origin/gh/janeyx99/316/base
2025-12-04T08:57:43.7799485Z  * [new branch]              gh/janeyx99/316/head        -> origin/gh/janeyx99/316/head
2025-12-04T08:57:43.7800585Z  * [new branch]              gh/janeyx99/316/orig        -> origin/gh/janeyx99/316/orig
2025-12-04T08:57:43.7802200Z  * [new branch]              gh/janeyx99/317/base        -> origin/gh/janeyx99/317/base
2025-12-04T08:57:43.7803294Z  * [new branch]              gh/janeyx99/317/head        -> origin/gh/janeyx99/317/head
2025-12-04T08:57:43.7804357Z  * [new branch]              gh/janeyx99/317/orig        -> origin/gh/janeyx99/317/orig
2025-12-04T08:57:43.7805874Z  * [new branch]              gh/janeyx99/325/base        -> origin/gh/janeyx99/325/base
2025-12-04T08:57:43.7807005Z  * [new branch]              gh/janeyx99/325/head        -> origin/gh/janeyx99/325/head
2025-12-04T08:57:43.7808120Z  * [new branch]              gh/janeyx99/325/orig        -> origin/gh/janeyx99/325/orig
2025-12-04T08:57:43.7809604Z  * [new branch]              gh/janeyx99/327/base        -> origin/gh/janeyx99/327/base
2025-12-04T08:57:43.7810815Z  * [new branch]              gh/janeyx99/327/head        -> origin/gh/janeyx99/327/head
2025-12-04T08:57:43.7811900Z  * [new branch]              gh/janeyx99/327/orig        -> origin/gh/janeyx99/327/orig
2025-12-04T08:57:43.7813376Z  * [new branch]              gh/janeyx99/328/base        -> origin/gh/janeyx99/328/base
2025-12-04T08:57:43.7814501Z  * [new branch]              gh/janeyx99/328/head        -> origin/gh/janeyx99/328/head
2025-12-04T08:57:43.7815657Z  * [new branch]              gh/janeyx99/328/orig        -> origin/gh/janeyx99/328/orig
2025-12-04T08:57:43.7817322Z  * [new branch]              gh/janeyx99/329/base        -> origin/gh/janeyx99/329/base
2025-12-04T08:57:43.7818494Z  * [new branch]              gh/janeyx99/329/head        -> origin/gh/janeyx99/329/head
2025-12-04T08:57:43.7819767Z  * [new branch]              gh/janeyx99/329/orig        -> origin/gh/janeyx99/329/orig
2025-12-04T08:57:43.7822571Z  * [new branch]              gh/janeyx99/330/base        -> origin/gh/janeyx99/330/base
2025-12-04T08:57:43.7823804Z  * [new branch]              gh/janeyx99/330/head        -> origin/gh/janeyx99/330/head
2025-12-04T08:57:43.7825064Z  * [new branch]              gh/janeyx99/330/orig        -> origin/gh/janeyx99/330/orig
2025-12-04T08:57:43.7826492Z  * [new branch]              gh/janeyx99/331/base        -> origin/gh/janeyx99/331/base
2025-12-04T08:57:43.7827706Z  * [new branch]              gh/janeyx99/331/head        -> origin/gh/janeyx99/331/head
2025-12-04T08:57:43.7828819Z  * [new branch]              gh/janeyx99/331/orig        -> origin/gh/janeyx99/331/orig
2025-12-04T08:57:43.7830505Z  * [new branch]              gh/janeyx99/332/base        -> origin/gh/janeyx99/332/base
2025-12-04T08:57:43.7831496Z  * [new branch]              gh/janeyx99/332/head        -> origin/gh/janeyx99/332/head
2025-12-04T08:57:43.7832631Z  * [new branch]              gh/janeyx99/332/orig        -> origin/gh/janeyx99/332/orig
2025-12-04T08:57:43.7834116Z  * [new branch]              gh/janeyx99/333/base        -> origin/gh/janeyx99/333/base
2025-12-04T08:57:43.7835190Z  * [new branch]              gh/janeyx99/333/head        -> origin/gh/janeyx99/333/head
2025-12-04T08:57:43.7836276Z  * [new branch]              gh/janeyx99/333/orig        -> origin/gh/janeyx99/333/orig
2025-12-04T08:57:43.7837827Z  * [new branch]              gh/janeyx99/88/base         -> origin/gh/janeyx99/88/base
2025-12-04T08:57:43.7838929Z  * [new branch]              gh/janeyx99/88/head         -> origin/gh/janeyx99/88/head
2025-12-04T08:57:43.7840012Z  * [new branch]              gh/janeyx99/88/orig         -> origin/gh/janeyx99/88/orig
2025-12-04T08:57:43.7842081Z  * [new branch]              gh/jansel/360/base          -> origin/gh/jansel/360/base
2025-12-04T08:57:43.7842911Z  * [new branch]              gh/jansel/360/head          -> origin/gh/jansel/360/head
2025-12-04T08:57:43.7844354Z  * [new branch]              gh/jansel/451/base          -> origin/gh/jansel/451/base
2025-12-04T08:57:43.7845443Z  * [new branch]              gh/jansel/451/head          -> origin/gh/jansel/451/head
2025-12-04T08:57:43.7846597Z  * [new branch]              gh/jansel/451/orig          -> origin/gh/jansel/451/orig
2025-12-04T08:57:43.7847978Z  * [new branch]              gh/jansel/462/base          -> origin/gh/jansel/462/base
2025-12-04T08:57:43.7849028Z  * [new branch]              gh/jansel/462/head          -> origin/gh/jansel/462/head
2025-12-04T08:57:43.7850109Z  * [new branch]              gh/jansel/462/orig          -> origin/gh/jansel/462/orig
2025-12-04T08:57:43.7851580Z  * [new branch]              gh/jansel/533/base          -> origin/gh/jansel/533/base
2025-12-04T08:57:43.7852658Z  * [new branch]              gh/jansel/533/head          -> origin/gh/jansel/533/head
2025-12-04T08:57:43.7853731Z  * [new branch]              gh/jansel/533/orig          -> origin/gh/jansel/533/orig
2025-12-04T08:57:43.7855128Z  * [new branch]              gh/jansel/552/base          -> origin/gh/jansel/552/base
2025-12-04T08:57:43.7856420Z  * [new branch]              gh/jansel/552/head          -> origin/gh/jansel/552/head
2025-12-04T08:57:43.7857830Z  * [new branch]              gh/jansel/552/orig          -> origin/gh/jansel/552/orig
2025-12-04T08:57:43.7859328Z  * [new branch]              gh/jansel/553/base          -> origin/gh/jansel/553/base
2025-12-04T08:57:43.7860451Z  * [new branch]              gh/jansel/553/head          -> origin/gh/jansel/553/head
2025-12-04T08:57:43.7861669Z  * [new branch]              gh/jansel/553/orig          -> origin/gh/jansel/553/orig
2025-12-04T08:57:43.7863143Z  * [new branch]              gh/jansel/554/base          -> origin/gh/jansel/554/base
2025-12-04T08:57:43.7864222Z  * [new branch]              gh/jansel/554/head          -> origin/gh/jansel/554/head
2025-12-04T08:57:43.7865351Z  * [new branch]              gh/jansel/554/orig          -> origin/gh/jansel/554/orig
2025-12-04T08:57:43.7866864Z  * [new branch]              gh/jansel/555/base          -> origin/gh/jansel/555/base
2025-12-04T08:57:43.7867958Z  * [new branch]              gh/jansel/555/head          -> origin/gh/jansel/555/head
2025-12-04T08:57:43.7869233Z  * [new branch]              gh/jansel/555/orig          -> origin/gh/jansel/555/orig
2025-12-04T08:57:43.7870629Z  * [new branch]              gh/jansel/556/base          -> origin/gh/jansel/556/base
2025-12-04T08:57:43.7871835Z  * [new branch]              gh/jansel/556/head          -> origin/gh/jansel/556/head
2025-12-04T08:57:43.7872883Z  * [new branch]              gh/jansel/556/orig          -> origin/gh/jansel/556/orig
2025-12-04T08:57:43.7874341Z  * [new branch]              gh/jansel/557/base          -> origin/gh/jansel/557/base
2025-12-04T08:57:43.7875397Z  * [new branch]              gh/jansel/557/head          -> origin/gh/jansel/557/head
2025-12-04T08:57:43.7876498Z  * [new branch]              gh/jansel/557/orig          -> origin/gh/jansel/557/orig
2025-12-04T08:57:43.7878075Z  * [new branch]              gh/jansel/558/base          -> origin/gh/jansel/558/base
2025-12-04T08:57:43.7879030Z  * [new branch]              gh/jansel/558/head          -> origin/gh/jansel/558/head
2025-12-04T08:57:43.7880127Z  * [new branch]              gh/jansel/558/orig          -> origin/gh/jansel/558/orig
2025-12-04T08:57:43.7881578Z  * [new branch]              gh/jansel/559/base          -> origin/gh/jansel/559/base
2025-12-04T08:57:43.7882699Z  * [new branch]              gh/jansel/559/head          -> origin/gh/jansel/559/head
2025-12-04T08:57:43.7883756Z  * [new branch]              gh/jansel/559/orig          -> origin/gh/jansel/559/orig
2025-12-04T08:57:43.7885403Z  * [new branch]              gh/jansel/560/base          -> origin/gh/jansel/560/base
2025-12-04T08:57:43.7886579Z  * [new branch]              gh/jansel/560/head          -> origin/gh/jansel/560/head
2025-12-04T08:57:43.7887657Z  * [new branch]              gh/jansel/560/orig          -> origin/gh/jansel/560/orig
2025-12-04T08:57:43.7889101Z  * [new branch]              gh/jansel/561/base          -> origin/gh/jansel/561/base
2025-12-04T08:57:43.7890194Z  * [new branch]              gh/jansel/561/head          -> origin/gh/jansel/561/head
2025-12-04T08:57:43.7891261Z  * [new branch]              gh/jansel/561/orig          -> origin/gh/jansel/561/orig
2025-12-04T08:57:43.7892690Z  * [new branch]              gh/jansel/562/base          -> origin/gh/jansel/562/base
2025-12-04T08:57:43.7893825Z  * [new branch]              gh/jansel/562/head          -> origin/gh/jansel/562/head
2025-12-04T08:57:43.7894942Z  * [new branch]              gh/jansel/562/orig          -> origin/gh/jansel/562/orig
2025-12-04T08:57:43.7896395Z  * [new branch]              gh/jansel/563/base          -> origin/gh/jansel/563/base
2025-12-04T08:57:43.7897820Z  * [new branch]              gh/jansel/563/head          -> origin/gh/jansel/563/head
2025-12-04T08:57:43.7898916Z  * [new branch]              gh/jansel/563/orig          -> origin/gh/jansel/563/orig
2025-12-04T08:57:43.7900911Z  * [new branch]              gh/jansel/564/base          -> origin/gh/jansel/564/base
2025-12-04T08:57:43.7902106Z  * [new branch]              gh/jansel/564/head          -> origin/gh/jansel/564/head
2025-12-04T08:57:43.7903207Z  * [new branch]              gh/jansel/564/orig          -> origin/gh/jansel/564/orig
2025-12-04T08:57:43.7904974Z  * [new branch]              gh/jansel/565/base          -> origin/gh/jansel/565/base
2025-12-04T08:57:43.7905788Z  * [new branch]              gh/jansel/565/head          -> origin/gh/jansel/565/head
2025-12-04T08:57:43.7906971Z  * [new branch]              gh/jansel/565/orig          -> origin/gh/jansel/565/orig
2025-12-04T08:57:43.7908517Z  * [new branch]              gh/jansel/566/base          -> origin/gh/jansel/566/base
2025-12-04T08:57:43.7909781Z  * [new branch]              gh/jansel/566/head          -> origin/gh/jansel/566/head
2025-12-04T08:57:43.7910838Z  * [new branch]              gh/jansel/566/orig          -> origin/gh/jansel/566/orig
2025-12-04T08:57:43.7912306Z  * [new branch]              gh/jansel/567/base          -> origin/gh/jansel/567/base
2025-12-04T08:57:43.7913371Z  * [new branch]              gh/jansel/567/head          -> origin/gh/jansel/567/head
2025-12-04T08:57:43.7914554Z  * [new branch]              gh/jansel/567/orig          -> origin/gh/jansel/567/orig
2025-12-04T08:57:43.7916004Z  * [new branch]              gh/jansel/568/base          -> origin/gh/jansel/568/base
2025-12-04T08:57:43.7917239Z  * [new branch]              gh/jansel/568/head          -> origin/gh/jansel/568/head
2025-12-04T08:57:43.7918390Z  * [new branch]              gh/jansel/568/orig          -> origin/gh/jansel/568/orig
2025-12-04T08:57:43.7919837Z  * [new branch]              gh/jansel/569/base          -> origin/gh/jansel/569/base
2025-12-04T08:57:43.7921011Z  * [new branch]              gh/jansel/569/head          -> origin/gh/jansel/569/head
2025-12-04T08:57:43.7925499Z  * [new branch]              gh/jansel/569/orig          -> origin/gh/jansel/569/orig
2025-12-04T08:57:43.7927075Z  * [new branch]              gh/jansel/570/base          -> origin/gh/jansel/570/base
2025-12-04T08:57:43.7928259Z  * [new branch]              gh/jansel/570/head          -> origin/gh/jansel/570/head
2025-12-04T08:57:43.7929355Z  * [new branch]              gh/jansel/570/orig          -> origin/gh/jansel/570/orig
2025-12-04T08:57:43.7930862Z  * [new branch]              gh/jansel/571/base          -> origin/gh/jansel/571/base
2025-12-04T08:57:43.7931984Z  * [new branch]              gh/jansel/571/head          -> origin/gh/jansel/571/head
2025-12-04T08:57:43.7933124Z  * [new branch]              gh/jansel/571/orig          -> origin/gh/jansel/571/orig
2025-12-04T08:57:43.7934651Z  * [new branch]              gh/jansel/572/base          -> origin/gh/jansel/572/base
2025-12-04T08:57:43.7935834Z  * [new branch]              gh/jansel/572/head          -> origin/gh/jansel/572/head
2025-12-04T08:57:43.7937253Z  * [new branch]              gh/jansel/572/orig          -> origin/gh/jansel/572/orig
2025-12-04T08:57:43.7938886Z  * [new branch]              gh/jansel/573/base          -> origin/gh/jansel/573/base
2025-12-04T08:57:43.7940021Z  * [new branch]              gh/jansel/573/head          -> origin/gh/jansel/573/head
2025-12-04T08:57:43.7941154Z  * [new branch]              gh/jansel/573/orig          -> origin/gh/jansel/573/orig
2025-12-04T08:57:43.7942713Z  * [new branch]              gh/jansel/574/base          -> origin/gh/jansel/574/base
2025-12-04T08:57:43.7943904Z  * [new branch]              gh/jansel/574/head          -> origin/gh/jansel/574/head
2025-12-04T08:57:43.7945013Z  * [new branch]              gh/jansel/574/orig          -> origin/gh/jansel/574/orig
2025-12-04T08:57:43.7946560Z  * [new branch]              gh/jansel/575/base          -> origin/gh/jansel/575/base
2025-12-04T08:57:43.7947684Z  * [new branch]              gh/jansel/575/head          -> origin/gh/jansel/575/head
2025-12-04T08:57:43.7948917Z  * [new branch]              gh/jansel/575/orig          -> origin/gh/jansel/575/orig
2025-12-04T08:57:43.7950379Z  * [new branch]              gh/jansel/576/base          -> origin/gh/jansel/576/base
2025-12-04T08:57:43.7951547Z  * [new branch]              gh/jansel/576/head          -> origin/gh/jansel/576/head
2025-12-04T08:57:43.7952779Z  * [new branch]              gh/jansel/576/orig          -> origin/gh/jansel/576/orig
2025-12-04T08:57:43.7955045Z  * [new branch]              gh/jbschlosser/247/base     -> origin/gh/jbschlosser/247/base
2025-12-04T08:57:43.7956186Z  * [new branch]              gh/jbschlosser/247/head     -> origin/gh/jbschlosser/247/head
2025-12-04T08:57:43.7957258Z  * [new branch]              gh/jbschlosser/247/orig     -> origin/gh/jbschlosser/247/orig
2025-12-04T08:57:43.7958805Z  * [new branch]              gh/jbschlosser/250/base     -> origin/gh/jbschlosser/250/base
2025-12-04T08:57:43.7959800Z  * [new branch]              gh/jbschlosser/250/head     -> origin/gh/jbschlosser/250/head
2025-12-04T08:57:43.7960888Z  * [new branch]              gh/jbschlosser/250/orig     -> origin/gh/jbschlosser/250/orig
2025-12-04T08:57:43.7962748Z  * [new branch]              gh/jerryzh168/1/base        -> origin/gh/jerryzh168/1/base
2025-12-04T08:57:43.7963880Z  * [new branch]              gh/jerryzh168/1/head        -> origin/gh/jerryzh168/1/head
2025-12-04T08:57:43.7964815Z  * [new branch]              gh/jerryzh168/1/orig        -> origin/gh/jerryzh168/1/orig
2025-12-04T08:57:43.7966689Z  * [new branch]              gh/jiayisunx/59/base        -> origin/gh/jiayisunx/59/base
2025-12-04T08:57:43.7967787Z  * [new branch]              gh/jiayisunx/59/head        -> origin/gh/jiayisunx/59/head
2025-12-04T08:57:43.7968876Z  * [new branch]              gh/jiayisunx/59/orig        -> origin/gh/jiayisunx/59/orig
2025-12-04T08:57:43.7970313Z  * [new branch]              gh/jiayisunx/61/base        -> origin/gh/jiayisunx/61/base
2025-12-04T08:57:43.7971404Z  * [new branch]              gh/jiayisunx/61/head        -> origin/gh/jiayisunx/61/head
2025-12-04T08:57:43.7972478Z  * [new branch]              gh/jiayisunx/61/orig        -> origin/gh/jiayisunx/61/orig
2025-12-04T08:57:43.7973926Z  * [new branch]              gh/jiayisunx/68/base        -> origin/gh/jiayisunx/68/base
2025-12-04T08:57:43.7974976Z  * [new branch]              gh/jiayisunx/68/head        -> origin/gh/jiayisunx/68/head
2025-12-04T08:57:43.7976077Z  * [new branch]              gh/jiayisunx/68/orig        -> origin/gh/jiayisunx/68/orig
2025-12-04T08:57:43.7977888Z  * [new branch]              gh/jiayisunx/77/base        -> origin/gh/jiayisunx/77/base
2025-12-04T08:57:43.7979010Z  * [new branch]              gh/jiayisunx/77/head        -> origin/gh/jiayisunx/77/head
2025-12-04T08:57:43.7980240Z  * [new branch]              gh/jiayisunx/77/orig        -> origin/gh/jiayisunx/77/orig
2025-12-04T08:57:43.7981761Z  * [new branch]              gh/jiayisunx/78/base        -> origin/gh/jiayisunx/78/base
2025-12-04T08:57:43.7982820Z  * [new branch]              gh/jiayisunx/78/head        -> origin/gh/jiayisunx/78/head
2025-12-04T08:57:43.7984006Z  * [new branch]              gh/jiayisunx/78/orig        -> origin/gh/jiayisunx/78/orig
2025-12-04T08:57:43.7985437Z  * [new branch]              gh/jiayisunx/79/base        -> origin/gh/jiayisunx/79/base
2025-12-04T08:57:43.7986542Z  * [new branch]              gh/jiayisunx/79/head        -> origin/gh/jiayisunx/79/head
2025-12-04T08:57:43.7987657Z  * [new branch]              gh/jiayisunx/79/orig        -> origin/gh/jiayisunx/79/orig
2025-12-04T08:57:43.7989307Z  * [new branch]              gh/jiayisunx/82/base        -> origin/gh/jiayisunx/82/base
2025-12-04T08:57:43.7990376Z  * [new branch]              gh/jiayisunx/82/head        -> origin/gh/jiayisunx/82/head
2025-12-04T08:57:43.7991510Z  * [new branch]              gh/jiayisunx/82/orig        -> origin/gh/jiayisunx/82/orig
2025-12-04T08:57:43.7992845Z  * [new branch]              gh/jiayisunx/83/base        -> origin/gh/jiayisunx/83/base
2025-12-04T08:57:43.7993995Z  * [new branch]              gh/jiayisunx/83/head        -> origin/gh/jiayisunx/83/head
2025-12-04T08:57:43.7995173Z  * [new branch]              gh/jiayisunx/83/orig        -> origin/gh/jiayisunx/83/orig
2025-12-04T08:57:43.7996539Z  * [new branch]              gh/jiayisunx/84/base        -> origin/gh/jiayisunx/84/base
2025-12-04T08:57:43.7998103Z  * [new branch]              gh/jiayisunx/84/head        -> origin/gh/jiayisunx/84/head
2025-12-04T08:57:43.7999213Z  * [new branch]              gh/jiayisunx/84/orig        -> origin/gh/jiayisunx/84/orig
2025-12-04T08:57:43.8000662Z  * [new branch]              gh/jiayisunx/85/base        -> origin/gh/jiayisunx/85/base
2025-12-04T08:57:43.8001727Z  * [new branch]              gh/jiayisunx/85/head        -> origin/gh/jiayisunx/85/head
2025-12-04T08:57:43.8002800Z  * [new branch]              gh/jiayisunx/85/orig        -> origin/gh/jiayisunx/85/orig
2025-12-04T08:57:43.8004186Z  * [new branch]              gh/jiayisunx/86/base        -> origin/gh/jiayisunx/86/base
2025-12-04T08:57:43.8005260Z  * [new branch]              gh/jiayisunx/86/head        -> origin/gh/jiayisunx/86/head
2025-12-04T08:57:43.8006378Z  * [new branch]              gh/jiayisunx/86/orig        -> origin/gh/jiayisunx/86/orig
2025-12-04T08:57:43.8008066Z  * [new branch]              gh/jiayisunx/87/base        -> origin/gh/jiayisunx/87/base
2025-12-04T08:57:43.8008893Z  * [new branch]              gh/jiayisunx/87/head        -> origin/gh/jiayisunx/87/head
2025-12-04T08:57:43.8010085Z  * [new branch]              gh/jiayisunx/87/orig        -> origin/gh/jiayisunx/87/orig
2025-12-04T08:57:43.8011495Z  * [new branch]              gh/jiayisunx/88/base        -> origin/gh/jiayisunx/88/base
2025-12-04T08:57:43.8012613Z  * [new branch]              gh/jiayisunx/88/head        -> origin/gh/jiayisunx/88/head
2025-12-04T08:57:43.8013708Z  * [new branch]              gh/jiayisunx/88/orig        -> origin/gh/jiayisunx/88/orig
2025-12-04T08:57:43.8015118Z  * [new branch]              gh/jiayisunx/89/base        -> origin/gh/jiayisunx/89/base
2025-12-04T08:57:43.8016176Z  * [new branch]              gh/jiayisunx/89/head        -> origin/gh/jiayisunx/89/head
2025-12-04T08:57:43.8017837Z  * [new branch]              gh/jiayisunx/89/orig        -> origin/gh/jiayisunx/89/orig
2025-12-04T08:57:43.8019326Z  * [new branch]              gh/jiayisunx/90/base        -> origin/gh/jiayisunx/90/base
2025-12-04T08:57:43.8020393Z  * [new branch]              gh/jiayisunx/90/head        -> origin/gh/jiayisunx/90/head
2025-12-04T08:57:43.8021806Z  * [new branch]              gh/jiayisunx/90/orig        -> origin/gh/jiayisunx/90/orig
2025-12-04T08:57:43.8023486Z  * [new branch]              gh/jjwu@meta.com/1/base     -> origin/gh/jjwu@meta.com/1/base
2025-12-04T08:57:43.8024691Z  * [new branch]              gh/jjwu@meta.com/1/head     -> origin/gh/jjwu@meta.com/1/head
2025-12-04T08:57:43.8026399Z  * [new branch]              gh/jturney/1/base           -> origin/gh/jturney/1/base
2025-12-04T08:57:43.8027514Z  * [new branch]              gh/jturney/1/head           -> origin/gh/jturney/1/head
2025-12-04T08:57:43.8028647Z  * [new branch]              gh/jturney/1/orig           -> origin/gh/jturney/1/orig
2025-12-04T08:57:43.8030130Z  * [new branch]              gh/jturney/2/base           -> origin/gh/jturney/2/base
2025-12-04T08:57:43.8031246Z  * [new branch]              gh/jturney/2/head           -> origin/gh/jturney/2/head
2025-12-04T08:57:43.8032331Z  * [new branch]              gh/jturney/2/orig           -> origin/gh/jturney/2/orig
2025-12-04T08:57:43.8034278Z  * [new branch]              gh/karthickai/10/base       -> origin/gh/karthickai/10/base
2025-12-04T08:57:43.8035480Z  * [new branch]              gh/karthickai/10/head       -> origin/gh/karthickai/10/head
2025-12-04T08:57:43.8036610Z  * [new branch]              gh/karthickai/10/orig       -> origin/gh/karthickai/10/orig
2025-12-04T08:57:43.8038137Z  * [new branch]              gh/karthickai/11/base       -> origin/gh/karthickai/11/base
2025-12-04T08:57:43.8039321Z  * [new branch]              gh/karthickai/11/head       -> origin/gh/karthickai/11/head
2025-12-04T08:57:43.8040421Z  * [new branch]              gh/karthickai/11/orig       -> origin/gh/karthickai/11/orig
2025-12-04T08:57:43.8042297Z  * [new branch]              gh/karthickai/12/base       -> origin/gh/karthickai/12/base
2025-12-04T08:57:43.8043482Z  * [new branch]              gh/karthickai/12/head       -> origin/gh/karthickai/12/head
2025-12-04T08:57:43.8044612Z  * [new branch]              gh/karthickai/12/orig       -> origin/gh/karthickai/12/orig
2025-12-04T08:57:43.8046067Z  * [new branch]              gh/karthickai/13/base       -> origin/gh/karthickai/13/base
2025-12-04T08:57:43.8047252Z  * [new branch]              gh/karthickai/13/head       -> origin/gh/karthickai/13/head
2025-12-04T08:57:43.8048317Z  * [new branch]              gh/karthickai/13/orig       -> origin/gh/karthickai/13/orig
2025-12-04T08:57:43.8049997Z  * [new branch]              gh/karthickai/14/base       -> origin/gh/karthickai/14/base
2025-12-04T08:57:43.8051166Z  * [new branch]              gh/karthickai/14/head       -> origin/gh/karthickai/14/head
2025-12-04T08:57:43.8052278Z  * [new branch]              gh/karthickai/14/orig       -> origin/gh/karthickai/14/orig
2025-12-04T08:57:43.8054635Z  * [new branch]              gh/karthickai/15/base       -> origin/gh/karthickai/15/base
2025-12-04T08:57:43.8055685Z  * [new branch]              gh/karthickai/15/head       -> origin/gh/karthickai/15/head
2025-12-04T08:57:43.8057082Z  * [new branch]              gh/karthickai/15/orig       -> origin/gh/karthickai/15/orig
2025-12-04T08:57:43.8058602Z  * [new branch]              gh/karthickai/16/base       -> origin/gh/karthickai/16/base
2025-12-04T08:57:43.8059779Z  * [new branch]              gh/karthickai/16/head       -> origin/gh/karthickai/16/head
2025-12-04T08:57:43.8060913Z  * [new branch]              gh/karthickai/16/orig       -> origin/gh/karthickai/16/orig
2025-12-04T08:57:43.8062347Z  * [new branch]              gh/karthickai/17/base       -> origin/gh/karthickai/17/base
2025-12-04T08:57:43.8063374Z  * [new branch]              gh/karthickai/17/head       -> origin/gh/karthickai/17/head
2025-12-04T08:57:43.8064529Z  * [new branch]              gh/karthickai/17/orig       -> origin/gh/karthickai/17/orig
2025-12-04T08:57:43.8066163Z  * [new branch]              gh/karthickai/18/base       -> origin/gh/karthickai/18/base
2025-12-04T08:57:43.8067507Z  * [new branch]              gh/karthickai/18/head       -> origin/gh/karthickai/18/head
2025-12-04T08:57:43.8068907Z  * [new branch]              gh/karthickai/18/orig       -> origin/gh/karthickai/18/orig
2025-12-04T08:57:43.8071008Z  * [new branch]              gh/karthickai/19/base       -> origin/gh/karthickai/19/base
2025-12-04T08:57:43.8072150Z  * [new branch]              gh/karthickai/19/head       -> origin/gh/karthickai/19/head
2025-12-04T08:57:43.8073353Z  * [new branch]              gh/karthickai/19/orig       -> origin/gh/karthickai/19/orig
2025-12-04T08:57:43.8075570Z  * [new branch]              gh/karthickai/20/base       -> origin/gh/karthickai/20/base
2025-12-04T08:57:43.8077494Z  * [new branch]              gh/karthickai/20/head       -> origin/gh/karthickai/20/head
2025-12-04T08:57:43.8078622Z  * [new branch]              gh/karthickai/20/orig       -> origin/gh/karthickai/20/orig
2025-12-04T08:57:43.8080173Z  * [new branch]              gh/karthickai/21/base       -> origin/gh/karthickai/21/base
2025-12-04T08:57:43.8081480Z  * [new branch]              gh/karthickai/21/head       -> origin/gh/karthickai/21/head
2025-12-04T08:57:43.8082648Z  * [new branch]              gh/karthickai/21/orig       -> origin/gh/karthickai/21/orig
2025-12-04T08:57:43.8084271Z  * [new branch]              gh/karthickai/22/base       -> origin/gh/karthickai/22/base
2025-12-04T08:57:43.8085289Z  * [new branch]              gh/karthickai/22/head       -> origin/gh/karthickai/22/head
2025-12-04T08:57:43.8086373Z  * [new branch]              gh/karthickai/22/orig       -> origin/gh/karthickai/22/orig
2025-12-04T08:57:43.8088231Z  * [new branch]              gh/karthickai/23/base       -> origin/gh/karthickai/23/base
2025-12-04T08:57:43.8089510Z  * [new branch]              gh/karthickai/23/head       -> origin/gh/karthickai/23/head
2025-12-04T08:57:43.8090596Z  * [new branch]              gh/karthickai/23/orig       -> origin/gh/karthickai/23/orig
2025-12-04T08:57:43.8092040Z  * [new branch]              gh/karthickai/24/base       -> origin/gh/karthickai/24/base
2025-12-04T08:57:43.8093147Z  * [new branch]              gh/karthickai/24/head       -> origin/gh/karthickai/24/head
2025-12-04T08:57:43.8094256Z  * [new branch]              gh/karthickai/24/orig       -> origin/gh/karthickai/24/orig
2025-12-04T08:57:43.8096373Z  * [new branch]              gh/karthickai/25/base       -> origin/gh/karthickai/25/base
2025-12-04T08:57:43.8097946Z  * [new branch]              gh/karthickai/25/head       -> origin/gh/karthickai/25/head
2025-12-04T08:57:43.8099097Z  * [new branch]              gh/karthickai/25/orig       -> origin/gh/karthickai/25/orig
2025-12-04T08:57:43.8100523Z  * [new branch]              gh/karthickai/26/base       -> origin/gh/karthickai/26/base
2025-12-04T08:57:43.8101754Z  * [new branch]              gh/karthickai/26/head       -> origin/gh/karthickai/26/head
2025-12-04T08:57:43.8102968Z  * [new branch]              gh/karthickai/26/orig       -> origin/gh/karthickai/26/orig
2025-12-04T08:57:43.8106148Z  * [new branch]              gh/karthickai/6/base        -> origin/gh/karthickai/6/base
2025-12-04T08:57:43.8107922Z  * [new branch]              gh/karthickai/6/head        -> origin/gh/karthickai/6/head
2025-12-04T08:57:43.8109202Z  * [new branch]              gh/karthickai/6/orig        -> origin/gh/karthickai/6/orig
2025-12-04T08:57:43.8110986Z  * [new branch]              gh/krocki/1/base            -> origin/gh/krocki/1/base
2025-12-04T08:57:43.8112060Z  * [new branch]              gh/krocki/1/head            -> origin/gh/krocki/1/head
2025-12-04T08:57:43.8113147Z  * [new branch]              gh/krocki/1/orig            -> origin/gh/krocki/1/orig
2025-12-04T08:57:43.8114624Z  * [new branch]              gh/krocki/2/base            -> origin/gh/krocki/2/base
2025-12-04T08:57:43.8115692Z  * [new branch]              gh/krocki/2/head            -> origin/gh/krocki/2/head
2025-12-04T08:57:43.8116784Z  * [new branch]              gh/krocki/2/orig            -> origin/gh/krocki/2/orig
2025-12-04T08:57:43.8118488Z  * [new branch]              gh/kurtamohler/60/base      -> origin/gh/kurtamohler/60/base
2025-12-04T08:57:43.8119566Z  * [new branch]              gh/kurtamohler/60/head      -> origin/gh/kurtamohler/60/head
2025-12-04T08:57:43.8120878Z  * [new branch]              gh/kurtamohler/60/orig      -> origin/gh/kurtamohler/60/orig
2025-12-04T08:57:43.8122691Z  * [new branch]              gh/kurtamohler/61/base      -> origin/gh/kurtamohler/61/base
2025-12-04T08:57:43.8123820Z  * [new branch]              gh/kurtamohler/61/head      -> origin/gh/kurtamohler/61/head
2025-12-04T08:57:43.8124994Z  * [new branch]              gh/kurtamohler/61/orig      -> origin/gh/kurtamohler/61/orig
2025-12-04T08:57:43.8126501Z  * [new branch]              gh/kurtamohler/62/base      -> origin/gh/kurtamohler/62/base
2025-12-04T08:57:43.8127613Z  * [new branch]              gh/kurtamohler/62/head      -> origin/gh/kurtamohler/62/head
2025-12-04T08:57:43.8128735Z  * [new branch]              gh/kurtamohler/62/orig      -> origin/gh/kurtamohler/62/orig
2025-12-04T08:57:43.8130248Z  * [new branch]              gh/kurtamohler/63/base      -> origin/gh/kurtamohler/63/base
2025-12-04T08:57:43.8131389Z  * [new branch]              gh/kurtamohler/63/head      -> origin/gh/kurtamohler/63/head
2025-12-04T08:57:43.8132545Z  * [new branch]              gh/kurtamohler/63/orig      -> origin/gh/kurtamohler/63/orig
2025-12-04T08:57:43.8134089Z  * [new branch]              gh/kurtamohler/64/base      -> origin/gh/kurtamohler/64/base
2025-12-04T08:57:43.8135205Z  * [new branch]              gh/kurtamohler/64/head      -> origin/gh/kurtamohler/64/head
2025-12-04T08:57:43.8136419Z  * [new branch]              gh/kurtamohler/64/orig      -> origin/gh/kurtamohler/64/orig
2025-12-04T08:57:43.8138250Z  * [new branch]              gh/kurtamohler/65/base      -> origin/gh/kurtamohler/65/base
2025-12-04T08:57:43.8139345Z  * [new branch]              gh/kurtamohler/65/head      -> origin/gh/kurtamohler/65/head
2025-12-04T08:57:43.8140447Z  * [new branch]              gh/kurtamohler/65/orig      -> origin/gh/kurtamohler/65/orig
2025-12-04T08:57:43.8141885Z  * [new branch]              gh/kurtamohler/66/base      -> origin/gh/kurtamohler/66/base
2025-12-04T08:57:43.8142991Z  * [new branch]              gh/kurtamohler/66/head      -> origin/gh/kurtamohler/66/head
2025-12-04T08:57:43.8144091Z  * [new branch]              gh/kurtamohler/66/orig      -> origin/gh/kurtamohler/66/orig
2025-12-04T08:57:43.8145555Z  * [new branch]              gh/kurtamohler/67/base      -> origin/gh/kurtamohler/67/base
2025-12-04T08:57:43.8146650Z  * [new branch]              gh/kurtamohler/67/head      -> origin/gh/kurtamohler/67/head
2025-12-04T08:57:43.8147772Z  * [new branch]              gh/kurtamohler/67/orig      -> origin/gh/kurtamohler/67/orig
2025-12-04T08:57:43.8149766Z  * [new branch]              gh/kwen2501/130/base        -> origin/gh/kwen2501/130/base
2025-12-04T08:57:43.8151249Z  * [new branch]              gh/kwen2501/130/head        -> origin/gh/kwen2501/130/head
2025-12-04T08:57:43.8152326Z  * [new branch]              gh/kwen2501/130/orig        -> origin/gh/kwen2501/130/orig
2025-12-04T08:57:43.8153788Z  * [new branch]              gh/kwen2501/170/base        -> origin/gh/kwen2501/170/base
2025-12-04T08:57:43.8154853Z  * [new branch]              gh/kwen2501/170/head        -> origin/gh/kwen2501/170/head
2025-12-04T08:57:43.8156394Z  * [new branch]              gh/kwen2501/187/base        -> origin/gh/kwen2501/187/base
2025-12-04T08:57:43.8157642Z  * [new branch]              gh/kwen2501/187/head        -> origin/gh/kwen2501/187/head
2025-12-04T08:57:43.8158812Z  * [new branch]              gh/kwen2501/187/orig        -> origin/gh/kwen2501/187/orig
2025-12-04T08:57:43.8160247Z  * [new branch]              gh/kwen2501/188/base        -> origin/gh/kwen2501/188/base
2025-12-04T08:57:43.8161315Z  * [new branch]              gh/kwen2501/188/head        -> origin/gh/kwen2501/188/head
2025-12-04T08:57:43.8162460Z  * [new branch]              gh/kwen2501/188/orig        -> origin/gh/kwen2501/188/orig
2025-12-04T08:57:43.8163886Z  * [new branch]              gh/kwen2501/211/base        -> origin/gh/kwen2501/211/base
2025-12-04T08:57:43.8164966Z  * [new branch]              gh/kwen2501/211/head        -> origin/gh/kwen2501/211/head
2025-12-04T08:57:43.8166482Z  * [new branch]              gh/kwen2501/224/base        -> origin/gh/kwen2501/224/base
2025-12-04T08:57:43.8167992Z  * [new branch]              gh/kwen2501/224/head        -> origin/gh/kwen2501/224/head
2025-12-04T08:57:43.8169099Z  * [new branch]              gh/kwen2501/224/orig        -> origin/gh/kwen2501/224/orig
2025-12-04T08:57:43.8170505Z  * [new branch]              gh/kwen2501/228/base        -> origin/gh/kwen2501/228/base
2025-12-04T08:57:43.8171587Z  * [new branch]              gh/kwen2501/228/head        -> origin/gh/kwen2501/228/head
2025-12-04T08:57:43.8172667Z  * [new branch]              gh/kwen2501/228/orig        -> origin/gh/kwen2501/228/orig
2025-12-04T08:57:43.8174774Z  * [new branch]              gh/kwen2501/234/base        -> origin/gh/kwen2501/234/base
2025-12-04T08:57:43.8175869Z  * [new branch]              gh/kwen2501/234/head        -> origin/gh/kwen2501/234/head
2025-12-04T08:57:43.8177299Z  * [new branch]              gh/kwen2501/234/orig        -> origin/gh/kwen2501/234/orig
2025-12-04T08:57:43.8178773Z  * [new branch]              gh/kwen2501/235/base        -> origin/gh/kwen2501/235/base
2025-12-04T08:57:43.8179869Z  * [new branch]              gh/kwen2501/235/head        -> origin/gh/kwen2501/235/head
2025-12-04T08:57:43.8181001Z  * [new branch]              gh/kwen2501/235/orig        -> origin/gh/kwen2501/235/orig
2025-12-04T08:57:43.8182563Z  * [new branch]              gh/kwen2501/236/base        -> origin/gh/kwen2501/236/base
2025-12-04T08:57:43.8183671Z  * [new branch]              gh/kwen2501/236/head        -> origin/gh/kwen2501/236/head
2025-12-04T08:57:43.8184841Z  * [new branch]              gh/kwen2501/236/orig        -> origin/gh/kwen2501/236/orig
2025-12-04T08:57:43.8186255Z  * [new branch]              gh/kwen2501/237/base        -> origin/gh/kwen2501/237/base
2025-12-04T08:57:43.8187363Z  * [new branch]              gh/kwen2501/237/head        -> origin/gh/kwen2501/237/head
2025-12-04T08:57:43.8188459Z  * [new branch]              gh/kwen2501/237/orig        -> origin/gh/kwen2501/237/orig
2025-12-04T08:57:43.8190005Z  * [new branch]              gh/kwen2501/238/base        -> origin/gh/kwen2501/238/base
2025-12-04T08:57:43.8191068Z  * [new branch]              gh/kwen2501/238/head        -> origin/gh/kwen2501/238/head
2025-12-04T08:57:43.8192225Z  * [new branch]              gh/kwen2501/238/orig        -> origin/gh/kwen2501/238/orig
2025-12-04T08:57:43.8193657Z  * [new branch]              gh/kwen2501/240/base        -> origin/gh/kwen2501/240/base
2025-12-04T08:57:43.8194729Z  * [new branch]              gh/kwen2501/240/head        -> origin/gh/kwen2501/240/head
2025-12-04T08:57:43.8195871Z  * [new branch]              gh/kwen2501/240/orig        -> origin/gh/kwen2501/240/orig
2025-12-04T08:57:43.8197337Z  * [new branch]              gh/kwen2501/241/base        -> origin/gh/kwen2501/241/base
2025-12-04T08:57:43.8198448Z  * [new branch]              gh/kwen2501/241/head        -> origin/gh/kwen2501/241/head
2025-12-04T08:57:43.8199491Z  * [new branch]              gh/kwen2501/241/orig        -> origin/gh/kwen2501/241/orig
2025-12-04T08:57:43.8200988Z  * [new branch]              gh/kwen2501/247/base        -> origin/gh/kwen2501/247/base
2025-12-04T08:57:43.8202070Z  * [new branch]              gh/kwen2501/247/head        -> origin/gh/kwen2501/247/head
2025-12-04T08:57:43.8203198Z  * [new branch]              gh/kwen2501/247/orig        -> origin/gh/kwen2501/247/orig
2025-12-04T08:57:43.8204556Z  * [new branch]              gh/kwen2501/252/base        -> origin/gh/kwen2501/252/base
2025-12-04T08:57:43.8205603Z  * [new branch]              gh/kwen2501/252/head        -> origin/gh/kwen2501/252/head
2025-12-04T08:57:43.8206695Z  * [new branch]              gh/kwen2501/252/orig        -> origin/gh/kwen2501/252/orig
2025-12-04T08:57:43.8208687Z  * [new branch]              gh/kwen2501/259/base        -> origin/gh/kwen2501/259/base
2025-12-04T08:57:43.8209878Z  * [new branch]              gh/kwen2501/259/head        -> origin/gh/kwen2501/259/head
2025-12-04T08:57:43.8210995Z  * [new branch]              gh/kwen2501/259/orig        -> origin/gh/kwen2501/259/orig
2025-12-04T08:57:43.8212609Z  * [new branch]              gh/kwen2501/260/base        -> origin/gh/kwen2501/260/base
2025-12-04T08:57:43.8213806Z  * [new branch]              gh/kwen2501/260/head        -> origin/gh/kwen2501/260/head
2025-12-04T08:57:43.8214905Z  * [new branch]              gh/kwen2501/260/orig        -> origin/gh/kwen2501/260/orig
2025-12-04T08:57:43.8216448Z  * [new branch]              gh/kwen2501/268/base        -> origin/gh/kwen2501/268/base
2025-12-04T08:57:43.8217846Z  * [new branch]              gh/kwen2501/268/head        -> origin/gh/kwen2501/268/head
2025-12-04T08:57:43.8218928Z  * [new branch]              gh/kwen2501/268/orig        -> origin/gh/kwen2501/268/orig
2025-12-04T08:57:43.8220464Z  * [new branch]              gh/kwen2501/269/base        -> origin/gh/kwen2501/269/base
2025-12-04T08:57:43.8222001Z  * [new branch]              gh/kwen2501/269/head        -> origin/gh/kwen2501/269/head
2025-12-04T08:57:43.8223326Z  * [new branch]              gh/kwen2501/269/orig        -> origin/gh/kwen2501/269/orig
2025-12-04T08:57:43.8224891Z  * [new branch]              gh/kwen2501/270/base        -> origin/gh/kwen2501/270/base
2025-12-04T08:57:43.8226129Z  * [new branch]              gh/kwen2501/270/head        -> origin/gh/kwen2501/270/head
2025-12-04T08:57:43.8227277Z  * [new branch]              gh/kwen2501/270/orig        -> origin/gh/kwen2501/270/orig
2025-12-04T08:57:43.8229018Z  * [new branch]              gh/kwen2501/271/base        -> origin/gh/kwen2501/271/base
2025-12-04T08:57:43.8230180Z  * [new branch]              gh/kwen2501/271/head        -> origin/gh/kwen2501/271/head
2025-12-04T08:57:43.8231315Z  * [new branch]              gh/kwen2501/271/orig        -> origin/gh/kwen2501/271/orig
2025-12-04T08:57:43.8233027Z  * [new branch]              gh/kwen2501/274/base        -> origin/gh/kwen2501/274/base
2025-12-04T08:57:43.8234281Z  * [new branch]              gh/kwen2501/274/head        -> origin/gh/kwen2501/274/head
2025-12-04T08:57:43.8235389Z  * [new branch]              gh/kwen2501/274/orig        -> origin/gh/kwen2501/274/orig
2025-12-04T08:57:43.8236987Z  * [new branch]              gh/kwen2501/275/base        -> origin/gh/kwen2501/275/base
2025-12-04T08:57:43.8238351Z  * [new branch]              gh/kwen2501/275/head        -> origin/gh/kwen2501/275/head
2025-12-04T08:57:43.8239476Z  * [new branch]              gh/kwen2501/275/orig        -> origin/gh/kwen2501/275/orig
2025-12-04T08:57:43.8240915Z  * [new branch]              gh/kwen2501/276/base        -> origin/gh/kwen2501/276/base
2025-12-04T08:57:43.8242150Z  * [new branch]              gh/kwen2501/276/head        -> origin/gh/kwen2501/276/head
2025-12-04T08:57:43.8243052Z  * [new branch]              gh/kwen2501/276/orig        -> origin/gh/kwen2501/276/orig
2025-12-04T08:57:43.8244710Z  * [new branch]              gh/kwen2501/277/base        -> origin/gh/kwen2501/277/base
2025-12-04T08:57:43.8245765Z  * [new branch]              gh/kwen2501/277/head        -> origin/gh/kwen2501/277/head
2025-12-04T08:57:43.8246867Z  * [new branch]              gh/kwen2501/277/orig        -> origin/gh/kwen2501/277/orig
2025-12-04T08:57:43.8248413Z  * [new branch]              gh/kwen2501/278/base        -> origin/gh/kwen2501/278/base
2025-12-04T08:57:43.8249479Z  * [new branch]              gh/kwen2501/278/head        -> origin/gh/kwen2501/278/head
2025-12-04T08:57:43.8250564Z  * [new branch]              gh/kwen2501/278/orig        -> origin/gh/kwen2501/278/orig
2025-12-04T08:57:43.8252105Z  * [new branch]              gh/kwen2501/279/base        -> origin/gh/kwen2501/279/base
2025-12-04T08:57:43.8253299Z  * [new branch]              gh/kwen2501/279/head        -> origin/gh/kwen2501/279/head
2025-12-04T08:57:43.8254510Z  * [new branch]              gh/kwen2501/279/orig        -> origin/gh/kwen2501/279/orig
2025-12-04T08:57:43.8255984Z  * [new branch]              gh/kwen2501/280/base        -> origin/gh/kwen2501/280/base
2025-12-04T08:57:43.8257473Z  * [new branch]              gh/kwen2501/280/head        -> origin/gh/kwen2501/280/head
2025-12-04T08:57:43.8258620Z  * [new branch]              gh/kwen2501/280/orig        -> origin/gh/kwen2501/280/orig
2025-12-04T08:57:43.8260269Z  * [new branch]              gh/kwen2501/281/base        -> origin/gh/kwen2501/281/base
2025-12-04T08:57:43.8261357Z  * [new branch]              gh/kwen2501/281/head        -> origin/gh/kwen2501/281/head
2025-12-04T08:57:43.8262526Z  * [new branch]              gh/kwen2501/281/orig        -> origin/gh/kwen2501/281/orig
2025-12-04T08:57:43.8264125Z  * [new branch]              gh/kwen2501/282/base        -> origin/gh/kwen2501/282/base
2025-12-04T08:57:43.8265292Z  * [new branch]              gh/kwen2501/282/head        -> origin/gh/kwen2501/282/head
2025-12-04T08:57:43.8266454Z  * [new branch]              gh/kwen2501/282/orig        -> origin/gh/kwen2501/282/orig
2025-12-04T08:57:43.8267970Z  * [new branch]              gh/kwen2501/283/base        -> origin/gh/kwen2501/283/base
2025-12-04T08:57:43.8269260Z  * [new branch]              gh/kwen2501/283/head        -> origin/gh/kwen2501/283/head
2025-12-04T08:57:43.8270387Z  * [new branch]              gh/kwen2501/283/orig        -> origin/gh/kwen2501/283/orig
2025-12-04T08:57:43.8271891Z  * [new branch]              gh/kwen2501/284/base        -> origin/gh/kwen2501/284/base
2025-12-04T08:57:43.8273079Z  * [new branch]              gh/kwen2501/284/head        -> origin/gh/kwen2501/284/head
2025-12-04T08:57:43.8274220Z  * [new branch]              gh/kwen2501/284/orig        -> origin/gh/kwen2501/284/orig
2025-12-04T08:57:43.8275899Z  * [new branch]              gh/kwen2501/285/base        -> origin/gh/kwen2501/285/base
2025-12-04T08:57:43.8276959Z  * [new branch]              gh/kwen2501/285/head        -> origin/gh/kwen2501/285/head
2025-12-04T08:57:43.8278046Z  * [new branch]              gh/kwen2501/285/orig        -> origin/gh/kwen2501/285/orig
2025-12-04T08:57:43.8279502Z  * [new branch]              gh/kwen2501/286/base        -> origin/gh/kwen2501/286/base
2025-12-04T08:57:43.8280633Z  * [new branch]              gh/kwen2501/286/head        -> origin/gh/kwen2501/286/head
2025-12-04T08:57:43.8281742Z  * [new branch]              gh/kwen2501/286/orig        -> origin/gh/kwen2501/286/orig
2025-12-04T08:57:43.8283064Z  * [new branch]              gh/kwen2501/287/base        -> origin/gh/kwen2501/287/base
2025-12-04T08:57:43.8284183Z  * [new branch]              gh/kwen2501/287/head        -> origin/gh/kwen2501/287/head
2025-12-04T08:57:43.8285270Z  * [new branch]              gh/kwen2501/287/orig        -> origin/gh/kwen2501/287/orig
2025-12-04T08:57:43.8286873Z  * [new branch]              gh/kwen2501/288/base        -> origin/gh/kwen2501/288/base
2025-12-04T08:57:43.8287880Z  * [new branch]              gh/kwen2501/288/head        -> origin/gh/kwen2501/288/head
2025-12-04T08:57:43.8289643Z  * [new branch]              gh/kwen2501/288/orig        -> origin/gh/kwen2501/288/orig
2025-12-04T08:57:43.8291982Z  * [new branch]              gh/laithsakka/251/base      -> origin/gh/laithsakka/251/base
2025-12-04T08:57:43.8293067Z  * [new branch]              gh/laithsakka/251/head      -> origin/gh/laithsakka/251/head
2025-12-04T08:57:43.8294159Z  * [new branch]              gh/laithsakka/251/orig      -> origin/gh/laithsakka/251/orig
2025-12-04T08:57:43.8295588Z  * [new branch]              gh/laithsakka/276/base      -> origin/gh/laithsakka/276/base
2025-12-04T08:57:43.8296850Z  * [new branch]              gh/laithsakka/276/head      -> origin/gh/laithsakka/276/head
2025-12-04T08:57:43.8298092Z  * [new branch]              gh/laithsakka/276/orig      -> origin/gh/laithsakka/276/orig
2025-12-04T08:57:43.8299753Z  * [new branch]              gh/laithsakka/28/base       -> origin/gh/laithsakka/28/base
2025-12-04T08:57:43.8301147Z  * [new branch]              gh/laithsakka/29/base       -> origin/gh/laithsakka/29/base
2025-12-04T08:57:43.8302908Z  * [new branch]              gh/laithsakka/30/base       -> origin/gh/laithsakka/30/base
2025-12-04T08:57:43.8304055Z  * [new branch]              gh/laithsakka/30/head       -> origin/gh/laithsakka/30/head
2025-12-04T08:57:43.8305474Z  * [new branch]              gh/laithsakka/31/base       -> origin/gh/laithsakka/31/base
2025-12-04T08:57:43.8306999Z  * [new branch]              gh/laithsakka/31/head       -> origin/gh/laithsakka/31/head
2025-12-04T08:57:43.8308753Z  * [new branch]              gh/laithsakka/313/base      -> origin/gh/laithsakka/313/base
2025-12-04T08:57:43.8309810Z  * [new branch]              gh/laithsakka/313/head      -> origin/gh/laithsakka/313/head
2025-12-04T08:57:43.8310953Z  * [new branch]              gh/laithsakka/313/orig      -> origin/gh/laithsakka/313/orig
2025-12-04T08:57:43.8312688Z  * [new branch]              gh/laithsakka/316/base      -> origin/gh/laithsakka/316/base
2025-12-04T08:57:43.8313677Z  * [new branch]              gh/laithsakka/316/head      -> origin/gh/laithsakka/316/head
2025-12-04T08:57:43.8314784Z  * [new branch]              gh/laithsakka/316/orig      -> origin/gh/laithsakka/316/orig
2025-12-04T08:57:43.8316271Z  * [new branch]              gh/laithsakka/317/base      -> origin/gh/laithsakka/317/base
2025-12-04T08:57:43.8317291Z  * [new branch]              gh/laithsakka/317/head      -> origin/gh/laithsakka/317/head
2025-12-04T08:57:43.8318301Z  * [new branch]              gh/laithsakka/317/orig      -> origin/gh/laithsakka/317/orig
2025-12-04T08:57:43.8319905Z  * [new branch]              gh/laithsakka/319/base      -> origin/gh/laithsakka/319/base
2025-12-04T08:57:43.8321334Z  * [new branch]              gh/laithsakka/319/head      -> origin/gh/laithsakka/319/head
2025-12-04T08:57:43.8325078Z  * [new branch]              gh/laithsakka/319/orig      -> origin/gh/laithsakka/319/orig
2025-12-04T08:57:43.8326409Z  * [new branch]              gh/laithsakka/32/base       -> origin/gh/laithsakka/32/base
2025-12-04T08:57:43.8327460Z  * [new branch]              gh/laithsakka/32/head       -> origin/gh/laithsakka/32/head
2025-12-04T08:57:43.8329114Z  * [new branch]              gh/laithsakka/320/base      -> origin/gh/laithsakka/320/base
2025-12-04T08:57:43.8330164Z  * [new branch]              gh/laithsakka/320/head      -> origin/gh/laithsakka/320/head
2025-12-04T08:57:43.8331744Z  * [new branch]              gh/laithsakka/320/orig      -> origin/gh/laithsakka/320/orig
2025-12-04T08:57:43.8333226Z  * [new branch]              gh/laithsakka/321/base      -> origin/gh/laithsakka/321/base
2025-12-04T08:57:43.8334463Z  * [new branch]              gh/laithsakka/321/head      -> origin/gh/laithsakka/321/head
2025-12-04T08:57:43.8335635Z  * [new branch]              gh/laithsakka/321/orig      -> origin/gh/laithsakka/321/orig
2025-12-04T08:57:43.8337573Z  * [new branch]              gh/laithsakka/322/base      -> origin/gh/laithsakka/322/base
2025-12-04T08:57:43.8339299Z  * [new branch]              gh/laithsakka/322/head      -> origin/gh/laithsakka/322/head
2025-12-04T08:57:43.8340473Z  * [new branch]              gh/laithsakka/322/orig      -> origin/gh/laithsakka/322/orig
2025-12-04T08:57:43.8342055Z  * [new branch]              gh/laithsakka/323/base      -> origin/gh/laithsakka/323/base
2025-12-04T08:57:43.8343307Z  * [new branch]              gh/laithsakka/323/head      -> origin/gh/laithsakka/323/head
2025-12-04T08:57:43.8344467Z  * [new branch]              gh/laithsakka/323/orig      -> origin/gh/laithsakka/323/orig
2025-12-04T08:57:43.8346117Z  * [new branch]              gh/laithsakka/324/base      -> origin/gh/laithsakka/324/base
2025-12-04T08:57:43.8347155Z  * [new branch]              gh/laithsakka/324/head      -> origin/gh/laithsakka/324/head
2025-12-04T08:57:43.8348244Z  * [new branch]              gh/laithsakka/324/orig      -> origin/gh/laithsakka/324/orig
2025-12-04T08:57:43.8349881Z  * [new branch]              gh/laithsakka/325/base      -> origin/gh/laithsakka/325/base
2025-12-04T08:57:43.8350980Z  * [new branch]              gh/laithsakka/325/head      -> origin/gh/laithsakka/325/head
2025-12-04T08:57:43.8352047Z  * [new branch]              gh/laithsakka/325/orig      -> origin/gh/laithsakka/325/orig
2025-12-04T08:57:43.8353804Z  * [new branch]              gh/laithsakka/326/base      -> origin/gh/laithsakka/326/base
2025-12-04T08:57:43.8355158Z  * [new branch]              gh/laithsakka/326/head      -> origin/gh/laithsakka/326/head
2025-12-04T08:57:43.8356287Z  * [new branch]              gh/laithsakka/326/orig      -> origin/gh/laithsakka/326/orig
2025-12-04T08:57:43.8357806Z  * [new branch]              gh/laithsakka/327/base      -> origin/gh/laithsakka/327/base
2025-12-04T08:57:43.8358971Z  * [new branch]              gh/laithsakka/327/head      -> origin/gh/laithsakka/327/head
2025-12-04T08:57:43.8360265Z  * [new branch]              gh/laithsakka/327/orig      -> origin/gh/laithsakka/327/orig
2025-12-04T08:57:43.8361807Z  * [new branch]              gh/laithsakka/328/base      -> origin/gh/laithsakka/328/base
2025-12-04T08:57:43.8362874Z  * [new branch]              gh/laithsakka/328/head      -> origin/gh/laithsakka/328/head
2025-12-04T08:57:43.8363928Z  * [new branch]              gh/laithsakka/328/orig      -> origin/gh/laithsakka/328/orig
2025-12-04T08:57:43.8366095Z  * [new branch]              gh/liangel/4/base           -> origin/gh/liangel/4/base
2025-12-04T08:57:43.8367214Z  * [new branch]              gh/liangel/4/head           -> origin/gh/liangel/4/head
2025-12-04T08:57:43.8368289Z  * [new branch]              gh/liangel/4/orig           -> origin/gh/liangel/4/orig
2025-12-04T08:57:43.8372078Z  * [new branch]              gh/lucaskabela/1/base       -> origin/gh/lucaskabela/1/base
2025-12-04T08:57:43.8373166Z  * [new branch]              gh/lucaskabela/1/head       -> origin/gh/lucaskabela/1/head
2025-12-04T08:57:43.8374846Z  * [new branch]              gh/lw/4/base                -> origin/gh/lw/4/base
2025-12-04T08:57:43.8375901Z  * [new branch]              gh/lw/4/head                -> origin/gh/lw/4/head
2025-12-04T08:57:43.8377343Z  * [new branch]              gh/lw/4/orig                -> origin/gh/lw/4/orig
2025-12-04T08:57:43.8378866Z  * [new branch]              gh/lw/5/base                -> origin/gh/lw/5/base
2025-12-04T08:57:43.8380013Z  * [new branch]              gh/lw/5/head                -> origin/gh/lw/5/head
2025-12-04T08:57:43.8381113Z  * [new branch]              gh/lw/5/orig                -> origin/gh/lw/5/orig
2025-12-04T08:57:43.8382567Z  * [new branch]              gh/lw/6/base                -> origin/gh/lw/6/base
2025-12-04T08:57:43.8383668Z  * [new branch]              gh/lw/6/head                -> origin/gh/lw/6/head
2025-12-04T08:57:43.8384831Z  * [new branch]              gh/lw/6/orig                -> origin/gh/lw/6/orig
2025-12-04T08:57:43.8386739Z  * [new branch]              gh/malfet/14/base           -> origin/gh/malfet/14/base
2025-12-04T08:57:43.8388189Z  * [new branch]              gh/malfet/417/base          -> origin/gh/malfet/417/base
2025-12-04T08:57:43.8389364Z  * [new branch]              gh/malfet/417/head          -> origin/gh/malfet/417/head
2025-12-04T08:57:43.8390485Z  * [new branch]              gh/malfet/417/orig          -> origin/gh/malfet/417/orig
2025-12-04T08:57:43.8391882Z  * [new branch]              gh/malfet/506/base          -> origin/gh/malfet/506/base
2025-12-04T08:57:43.8393030Z  * [new branch]              gh/malfet/506/head          -> origin/gh/malfet/506/head
2025-12-04T08:57:43.8394111Z  * [new branch]              gh/malfet/506/orig          -> origin/gh/malfet/506/orig
2025-12-04T08:57:43.8395664Z  * [new branch]              gh/malfet/517/base          -> origin/gh/malfet/517/base
2025-12-04T08:57:43.8396764Z  * [new branch]              gh/malfet/517/head          -> origin/gh/malfet/517/head
2025-12-04T08:57:43.8398199Z  * [new branch]              gh/malfet/528/base          -> origin/gh/malfet/528/base
2025-12-04T08:57:43.8399398Z  * [new branch]              gh/malfet/528/head          -> origin/gh/malfet/528/head
2025-12-04T08:57:43.8400467Z  * [new branch]              gh/malfet/528/orig          -> origin/gh/malfet/528/orig
2025-12-04T08:57:43.8401952Z  * [new branch]              gh/malfet/537/base          -> origin/gh/malfet/537/base
2025-12-04T08:57:43.8403015Z  * [new branch]              gh/malfet/537/head          -> origin/gh/malfet/537/head
2025-12-04T08:57:43.8404089Z  * [new branch]              gh/malfet/537/orig          -> origin/gh/malfet/537/orig
2025-12-04T08:57:43.8405517Z  * [new branch]              gh/malfet/546/base          -> origin/gh/malfet/546/base
2025-12-04T08:57:43.8406579Z  * [new branch]              gh/malfet/546/head          -> origin/gh/malfet/546/head
2025-12-04T08:57:43.8407661Z  * [new branch]              gh/malfet/546/orig          -> origin/gh/malfet/546/orig
2025-12-04T08:57:43.8409146Z  * [new branch]              gh/malfet/565/base          -> origin/gh/malfet/565/base
2025-12-04T08:57:43.8410109Z  * [new branch]              gh/malfet/565/head          -> origin/gh/malfet/565/head
2025-12-04T08:57:43.8411218Z  * [new branch]              gh/malfet/565/orig          -> origin/gh/malfet/565/orig
2025-12-04T08:57:43.8412634Z  * [new branch]              gh/malfet/575/base          -> origin/gh/malfet/575/base
2025-12-04T08:57:43.8413835Z  * [new branch]              gh/malfet/575/head          -> origin/gh/malfet/575/head
2025-12-04T08:57:43.8414922Z  * [new branch]              gh/malfet/575/orig          -> origin/gh/malfet/575/orig
2025-12-04T08:57:43.8416444Z  * [new branch]              gh/malfet/580/base          -> origin/gh/malfet/580/base
2025-12-04T08:57:43.8417848Z  * [new branch]              gh/malfet/580/head          -> origin/gh/malfet/580/head
2025-12-04T08:57:43.8418965Z  * [new branch]              gh/malfet/580/orig          -> origin/gh/malfet/580/orig
2025-12-04T08:57:43.8420462Z  * [new branch]              gh/malfet/581/base          -> origin/gh/malfet/581/base
2025-12-04T08:57:43.8421824Z  * [new branch]              gh/malfet/581/head          -> origin/gh/malfet/581/head
2025-12-04T08:57:43.8422947Z  * [new branch]              gh/malfet/581/orig          -> origin/gh/malfet/581/orig
2025-12-04T08:57:43.8424352Z  * [new branch]              gh/malfet/583/base          -> origin/gh/malfet/583/base
2025-12-04T08:57:43.8425499Z  * [new branch]              gh/malfet/583/head          -> origin/gh/malfet/583/head
2025-12-04T08:57:43.8426608Z  * [new branch]              gh/malfet/583/orig          -> origin/gh/malfet/583/orig
2025-12-04T08:57:43.8428225Z  * [new branch]              gh/malfet/586/base          -> origin/gh/malfet/586/base
2025-12-04T08:57:43.8429512Z  * [new branch]              gh/malfet/586/head          -> origin/gh/malfet/586/head
2025-12-04T08:57:43.8430738Z  * [new branch]              gh/malfet/586/orig          -> origin/gh/malfet/586/orig
2025-12-04T08:57:43.8432082Z  * [new branch]              gh/malfet/587/base          -> origin/gh/malfet/587/base
2025-12-04T08:57:43.8433252Z  * [new branch]              gh/malfet/587/head          -> origin/gh/malfet/587/head
2025-12-04T08:57:43.8434321Z  * [new branch]              gh/malfet/587/orig          -> origin/gh/malfet/587/orig
2025-12-04T08:57:43.8435725Z  * [new branch]              gh/malfet/588/base          -> origin/gh/malfet/588/base
2025-12-04T08:57:43.8436791Z  * [new branch]              gh/malfet/588/head          -> origin/gh/malfet/588/head
2025-12-04T08:57:43.8438040Z  * [new branch]              gh/malfet/588/orig          -> origin/gh/malfet/588/orig
2025-12-04T08:57:43.8439515Z  * [new branch]              gh/malfet/589/base          -> origin/gh/malfet/589/base
2025-12-04T08:57:43.8440527Z  * [new branch]              gh/malfet/589/head          -> origin/gh/malfet/589/head
2025-12-04T08:57:43.8441607Z  * [new branch]              gh/malfet/589/orig          -> origin/gh/malfet/589/orig
2025-12-04T08:57:43.8443015Z  * [new branch]              gh/malfet/590/base          -> origin/gh/malfet/590/base
2025-12-04T08:57:43.8444196Z  * [new branch]              gh/malfet/590/head          -> origin/gh/malfet/590/head
2025-12-04T08:57:43.8445266Z  * [new branch]              gh/malfet/590/orig          -> origin/gh/malfet/590/orig
2025-12-04T08:57:43.8447548Z  * [new branch]              gh/malfet/591/base          -> origin/gh/malfet/591/base
2025-12-04T08:57:43.8448645Z  * [new branch]              gh/malfet/591/head          -> origin/gh/malfet/591/head
2025-12-04T08:57:43.8449836Z  * [new branch]              gh/malfet/591/orig          -> origin/gh/malfet/591/orig
2025-12-04T08:57:43.8451285Z  * [new branch]              gh/malfet/592/base          -> origin/gh/malfet/592/base
2025-12-04T08:57:43.8452411Z  * [new branch]              gh/malfet/592/head          -> origin/gh/malfet/592/head
2025-12-04T08:57:43.8453534Z  * [new branch]              gh/malfet/592/orig          -> origin/gh/malfet/592/orig
2025-12-04T08:57:43.8454973Z  * [new branch]              gh/malfet/593/base          -> origin/gh/malfet/593/base
2025-12-04T08:57:43.8456037Z  * [new branch]              gh/malfet/593/head          -> origin/gh/malfet/593/head
2025-12-04T08:57:43.8457477Z  * [new branch]              gh/malfet/593/orig          -> origin/gh/malfet/593/orig
2025-12-04T08:57:43.8459018Z  * [new branch]              gh/malfet/594/base          -> origin/gh/malfet/594/base
2025-12-04T08:57:43.8460222Z  * [new branch]              gh/malfet/594/head          -> origin/gh/malfet/594/head
2025-12-04T08:57:43.8461344Z  * [new branch]              gh/malfet/594/orig          -> origin/gh/malfet/594/orig
2025-12-04T08:57:43.8462798Z  * [new branch]              gh/malfet/595/base          -> origin/gh/malfet/595/base
2025-12-04T08:57:43.8463867Z  * [new branch]              gh/malfet/595/head          -> origin/gh/malfet/595/head
2025-12-04T08:57:43.8465110Z  * [new branch]              gh/malfet/595/orig          -> origin/gh/malfet/595/orig
2025-12-04T08:57:43.8466583Z  * [new branch]              gh/malfet/596/base          -> origin/gh/malfet/596/base
2025-12-04T08:57:43.8467756Z  * [new branch]              gh/malfet/596/head          -> origin/gh/malfet/596/head
2025-12-04T08:57:43.8468868Z  * [new branch]              gh/malfet/596/orig          -> origin/gh/malfet/596/orig
2025-12-04T08:57:43.8470458Z  * [new branch]              gh/malfet/597/base          -> origin/gh/malfet/597/base
2025-12-04T08:57:43.8471515Z  * [new branch]              gh/malfet/597/head          -> origin/gh/malfet/597/head
2025-12-04T08:57:43.8472576Z  * [new branch]              gh/malfet/597/orig          -> origin/gh/malfet/597/orig
2025-12-04T08:57:43.8474034Z  * [new branch]              gh/malfet/598/base          -> origin/gh/malfet/598/base
2025-12-04T08:57:43.8475233Z  * [new branch]              gh/malfet/598/head          -> origin/gh/malfet/598/head
2025-12-04T08:57:43.8476430Z  * [new branch]              gh/malfet/598/orig          -> origin/gh/malfet/598/orig
2025-12-04T08:57:43.8477792Z  * [new branch]              gh/malfet/599/base          -> origin/gh/malfet/599/base
2025-12-04T08:57:43.8478857Z  * [new branch]              gh/malfet/599/head          -> origin/gh/malfet/599/head
2025-12-04T08:57:43.8479964Z  * [new branch]              gh/malfet/599/orig          -> origin/gh/malfet/599/orig
2025-12-04T08:57:43.8481398Z  * [new branch]              gh/malfet/600/base          -> origin/gh/malfet/600/base
2025-12-04T08:57:43.8482460Z  * [new branch]              gh/malfet/600/head          -> origin/gh/malfet/600/head
2025-12-04T08:57:43.8501151Z  * [new branch]              gh/malfet/600/orig          -> origin/gh/malfet/600/orig
2025-12-04T08:57:43.8501640Z  * [new branch]              gh/malfet/601/base          -> origin/gh/malfet/601/base
2025-12-04T08:57:43.8501899Z  * [new branch]              gh/malfet/601/head          -> origin/gh/malfet/601/head
2025-12-04T08:57:43.8502152Z  * [new branch]              gh/malfet/601/orig          -> origin/gh/malfet/601/orig
2025-12-04T08:57:43.8502393Z  * [new branch]              gh/malfet/602/base          -> origin/gh/malfet/602/base
2025-12-04T08:57:43.8502642Z  * [new branch]              gh/malfet/602/head          -> origin/gh/malfet/602/head
2025-12-04T08:57:43.8502877Z  * [new branch]              gh/malfet/602/orig          -> origin/gh/malfet/602/orig
2025-12-04T08:57:43.8503125Z  * [new branch]              gh/malfet/603/base          -> origin/gh/malfet/603/base
2025-12-04T08:57:43.8503358Z  * [new branch]              gh/malfet/603/head          -> origin/gh/malfet/603/head
2025-12-04T08:57:43.8503593Z  * [new branch]              gh/malfet/603/orig          -> origin/gh/malfet/603/orig
2025-12-04T08:57:43.8503845Z  * [new branch]              gh/malfet/604/base          -> origin/gh/malfet/604/base
2025-12-04T08:57:43.8504084Z  * [new branch]              gh/malfet/604/head          -> origin/gh/malfet/604/head
2025-12-04T08:57:43.8504339Z  * [new branch]              gh/malfet/604/orig          -> origin/gh/malfet/604/orig
2025-12-04T08:57:43.8504579Z  * [new branch]              gh/malfet/605/base          -> origin/gh/malfet/605/base
2025-12-04T08:57:43.8504814Z  * [new branch]              gh/malfet/605/head          -> origin/gh/malfet/605/head
2025-12-04T08:57:43.8505067Z  * [new branch]              gh/malfet/605/orig          -> origin/gh/malfet/605/orig
2025-12-04T08:57:43.8505302Z  * [new branch]              gh/malfet/606/base          -> origin/gh/malfet/606/base
2025-12-04T08:57:43.8505781Z  * [new branch]              gh/malfet/606/head          -> origin/gh/malfet/606/head
2025-12-04T08:57:43.8507001Z  * [new branch]              gh/malfet/606/orig          -> origin/gh/malfet/606/orig
2025-12-04T08:57:43.8508484Z  * [new branch]              gh/malfet/607/base          -> origin/gh/malfet/607/base
2025-12-04T08:57:43.8509715Z  * [new branch]              gh/malfet/607/head          -> origin/gh/malfet/607/head
2025-12-04T08:57:43.8510879Z  * [new branch]              gh/malfet/607/orig          -> origin/gh/malfet/607/orig
2025-12-04T08:57:43.8512344Z  * [new branch]              gh/malfet/608/base          -> origin/gh/malfet/608/base
2025-12-04T08:57:43.8513422Z  * [new branch]              gh/malfet/608/head          -> origin/gh/malfet/608/head
2025-12-04T08:57:43.8514546Z  * [new branch]              gh/malfet/608/orig          -> origin/gh/malfet/608/orig
2025-12-04T08:57:43.8516050Z  * [new branch]              gh/malfet/609/base          -> origin/gh/malfet/609/base
2025-12-04T08:57:43.8517113Z  * [new branch]              gh/malfet/609/head          -> origin/gh/malfet/609/head
2025-12-04T08:57:43.8518224Z  * [new branch]              gh/malfet/609/orig          -> origin/gh/malfet/609/orig
2025-12-04T08:57:43.8519708Z  * [new branch]              gh/malfet/610/base          -> origin/gh/malfet/610/base
2025-12-04T08:57:43.8522013Z  * [new branch]              gh/malfet/610/head          -> origin/gh/malfet/610/head
2025-12-04T08:57:43.8523033Z  * [new branch]              gh/malfet/610/orig          -> origin/gh/malfet/610/orig
2025-12-04T08:57:43.8524581Z  * [new branch]              gh/malfet/611/base          -> origin/gh/malfet/611/base
2025-12-04T08:57:43.8525721Z  * [new branch]              gh/malfet/611/head          -> origin/gh/malfet/611/head
2025-12-04T08:57:43.8526846Z  * [new branch]              gh/malfet/611/orig          -> origin/gh/malfet/611/orig
2025-12-04T08:57:43.8528229Z  * [new branch]              gh/malfet/612/base          -> origin/gh/malfet/612/base
2025-12-04T08:57:43.8529760Z  * [new branch]              gh/malfet/612/head          -> origin/gh/malfet/612/head
2025-12-04T08:57:43.8530979Z  * [new branch]              gh/malfet/612/orig          -> origin/gh/malfet/612/orig
2025-12-04T08:57:43.8532513Z  * [new branch]              gh/malfet/64/base           -> origin/gh/malfet/64/base
2025-12-04T08:57:43.8533749Z  * [new branch]              gh/malfet/64/head           -> origin/gh/malfet/64/head
2025-12-04T08:57:43.8535511Z  * [new branch]              gh/manuelcandales/11/base   -> origin/gh/manuelcandales/11/base
2025-12-04T08:57:43.8537025Z  * [new branch]              gh/manuelcandales/11/head   -> origin/gh/manuelcandales/11/head
2025-12-04T08:57:43.8538247Z  * [new branch]              gh/manuelcandales/11/orig   -> origin/gh/manuelcandales/11/orig
2025-12-04T08:57:43.8541153Z  * [new branch]              gh/markkm/1/base            -> origin/gh/markkm/1/base
2025-12-04T08:57:43.8543146Z  * [new branch]              gh/masnesral/1/base         -> origin/gh/masnesral/1/base
2025-12-04T08:57:43.8544260Z  * [new branch]              gh/masnesral/1/head         -> origin/gh/masnesral/1/head
2025-12-04T08:57:43.8545366Z  * [new branch]              gh/masnesral/1/orig         -> origin/gh/masnesral/1/orig
2025-12-04T08:57:43.8547602Z  * [new branch]              gh/mhorowitz/0/base         -> origin/gh/mhorowitz/0/base
2025-12-04T08:57:43.8548850Z  * [new branch]              gh/mhorowitz/0/head         -> origin/gh/mhorowitz/0/head
2025-12-04T08:57:43.8550160Z  * [new branch]              gh/mhorowitz/1/base         -> origin/gh/mhorowitz/1/base
2025-12-04T08:57:43.8551339Z  * [new branch]              gh/mhorowitz/1/head         -> origin/gh/mhorowitz/1/head
2025-12-04T08:57:43.8552657Z  * [new branch]              gh/mhorowitz/2/base         -> origin/gh/mhorowitz/2/base
2025-12-04T08:57:43.8553759Z  * [new branch]              gh/mhorowitz/2/head         -> origin/gh/mhorowitz/2/head
2025-12-04T08:57:43.8555074Z  * [new branch]              gh/mhorowitz/3/base         -> origin/gh/mhorowitz/3/base
2025-12-04T08:57:43.8556111Z  * [new branch]              gh/mhorowitz/3/head         -> origin/gh/mhorowitz/3/head
2025-12-04T08:57:43.8557410Z  * [new branch]              gh/mhorowitz/4/base         -> origin/gh/mhorowitz/4/base
2025-12-04T08:57:43.8558489Z  * [new branch]              gh/mhorowitz/4/head         -> origin/gh/mhorowitz/4/head
2025-12-04T08:57:43.8559764Z  * [new branch]              gh/mhorowitz/5/base         -> origin/gh/mhorowitz/5/base
2025-12-04T08:57:43.8560773Z  * [new branch]              gh/mhorowitz/5/head         -> origin/gh/mhorowitz/5/head
2025-12-04T08:57:43.8562261Z  * [new branch]              gh/mhorowitz/6/base         -> origin/gh/mhorowitz/6/base
2025-12-04T08:57:43.8563286Z  * [new branch]              gh/mhorowitz/6/head         -> origin/gh/mhorowitz/6/head
2025-12-04T08:57:43.8565234Z  * [new branch]              gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T08:57:43.8566351Z  * [new branch]              gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T08:57:43.8568230Z  * [new branch]              gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T08:57:43.8569251Z  * [new branch]              gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T08:57:43.8570731Z  * [new branch]              gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T08:57:43.8571666Z  * [new branch]              gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T08:57:43.8573522Z  * [new branch]              gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T08:57:43.8574601Z  * [new branch]              gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T08:57:43.8575997Z  * [new branch]              gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T08:57:43.8577522Z  * [new branch]              gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T08:57:43.8579081Z  * [new branch]              gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T08:57:43.8580272Z  * [new branch]              gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T08:57:43.8581376Z  * [new branch]              gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T08:57:43.8583102Z  * [new branch]              gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T08:57:43.8584193Z  * [new branch]              gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T08:57:43.8585372Z  * [new branch]              gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T08:57:43.8587509Z  * [new branch]              gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T08:57:43.8588608Z  * [new branch]              gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T08:57:43.8589843Z  * [new branch]              gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T08:57:43.8591374Z  * [new branch]              gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T08:57:43.8592430Z  * [new branch]              gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T08:57:43.8593515Z  * [new branch]              gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T08:57:43.8595207Z  * [new branch]              gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T08:57:43.8596286Z  * [new branch]              gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T08:57:43.8597388Z  * [new branch]              gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T08:57:43.8598926Z  * [new branch]              gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T08:57:43.8599927Z  * [new branch]              gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T08:57:43.8601017Z  * [new branch]              gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T08:57:43.8602573Z  * [new branch]              gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T08:57:43.8603670Z  * [new branch]              gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T08:57:43.8604789Z  * [new branch]              gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T08:57:43.8606678Z  * [new branch]              gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T08:57:43.8607866Z  * [new branch]              gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T08:57:43.8608987Z  * [new branch]              gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T08:57:43.8610725Z  * [new branch]              gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T08:57:43.8611958Z  * [new branch]              gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T08:57:43.8613102Z  * [new branch]              gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T08:57:43.8614896Z  * [new branch]              gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T08:57:43.8616392Z  * [new branch]              gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T08:57:43.8617759Z  * [new branch]              gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T08:57:43.8619076Z  * [new branch]              gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T08:57:43.8620187Z  * [new branch]              gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T08:57:43.8624571Z  * [new branch]              gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T08:57:43.8626654Z  * [new branch]              gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T08:57:43.8627903Z  * [new branch]              gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T08:57:43.8629105Z  * [new branch]              gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T08:57:43.8630678Z  * [new branch]              gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T08:57:43.8631784Z  * [new branch]              gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T08:57:43.8633422Z  * [new branch]              gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T08:57:43.8635170Z  * [new branch]              gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T08:57:43.8636408Z  * [new branch]              gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T08:57:43.8637531Z  * [new branch]              gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T08:57:43.8638997Z  * [new branch]              gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T08:57:43.8640127Z  * [new branch]              gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T08:57:43.8641230Z  * [new branch]              gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T08:57:43.8642825Z  * [new branch]              gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T08:57:43.8643952Z  * [new branch]              gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T08:57:43.8645047Z  * [new branch]              gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T08:57:43.8646728Z  * [new branch]              gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T08:57:43.8647994Z  * [new branch]              gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T08:57:43.8649585Z  * [new branch]              gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T08:57:43.8651447Z  * [new branch]              gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T08:57:43.8652640Z  * [new branch]              gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T08:57:43.8653873Z  * [new branch]              gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T08:57:43.8655868Z  * [new branch]              gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T08:57:43.8657282Z  * [new branch]              gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T08:57:43.8658445Z  * [new branch]              gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T08:57:43.8660228Z  * [new branch]              gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T08:57:43.8661374Z  * [new branch]              gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T08:57:43.8662829Z  * [new branch]              gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T08:57:43.8664843Z  * [new branch]              gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T08:57:43.8665857Z  * [new branch]              gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T08:57:43.8667059Z  * [new branch]              gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T08:57:43.8668573Z  * [new branch]              gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T08:57:43.8669780Z  * [new branch]              gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T08:57:43.8670901Z  * [new branch]              gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T08:57:43.8672472Z  * [new branch]              gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T08:57:43.8673560Z  * [new branch]              gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T08:57:43.8674658Z  * [new branch]              gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T08:57:43.8676146Z  * [new branch]              gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T08:57:43.8677270Z  * [new branch]              gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T08:57:43.8678376Z  * [new branch]              gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T08:57:43.8680065Z  * [new branch]              gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T08:57:43.8681180Z  * [new branch]              gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T08:57:43.8682236Z  * [new branch]              gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T08:57:43.8683796Z  * [new branch]              gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T08:57:43.8684870Z  * [new branch]              gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T08:57:43.8685897Z  * [new branch]              gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T08:57:43.8687396Z  * [new branch]              gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T08:57:43.8688530Z  * [new branch]              gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T08:57:43.8689619Z  * [new branch]              gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T08:57:43.8691069Z  * [new branch]              gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T08:57:43.8692218Z  * [new branch]              gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T08:57:43.8693323Z  * [new branch]              gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T08:57:43.8694921Z  * [new branch]              gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T08:57:43.8696033Z  * [new branch]              gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T08:57:43.8697446Z  * [new branch]              gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T08:57:43.8699019Z  * [new branch]              gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T08:57:43.8700190Z  * [new branch]              gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T08:57:43.8701460Z  * [new branch]              gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T08:57:43.8703058Z  * [new branch]              gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T08:57:43.8704252Z  * [new branch]              gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T08:57:43.8705490Z  * [new branch]              gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T08:57:43.8706962Z  * [new branch]              gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T08:57:43.8708198Z  * [new branch]              gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T08:57:43.8709814Z  * [new branch]              gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T08:57:43.8711415Z  * [new branch]              gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T08:57:43.8713046Z  * [new branch]              gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T08:57:43.8714186Z  * [new branch]              gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T08:57:43.8715717Z  * [new branch]              gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T08:57:43.8716802Z  * [new branch]              gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T08:57:43.8717884Z  * [new branch]              gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T08:57:43.8719247Z  * [new branch]              gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T08:57:43.8720918Z  * [new branch]              gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T08:57:43.8722425Z  * [new branch]              gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T08:57:43.8723770Z  * [new branch]              gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T08:57:43.8724895Z  * [new branch]              gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T08:57:43.8725983Z  * [new branch]              gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T08:57:43.8727501Z  * [new branch]              gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T08:57:43.8728594Z  * [new branch]              gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T08:57:43.8729764Z  * [new branch]              gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T08:57:43.8731392Z  * [new branch]              gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T08:57:43.8732521Z  * [new branch]              gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T08:57:43.8733816Z  * [new branch]              gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T08:57:43.8735320Z  * [new branch]              gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T08:57:43.8736482Z  * [new branch]              gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T08:57:43.8737890Z  * [new branch]              gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T08:57:43.8739442Z  * [new branch]              gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T08:57:43.8740629Z  * [new branch]              gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T08:57:43.8741791Z  * [new branch]              gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T08:57:43.8743560Z  * [new branch]              gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T08:57:43.8744632Z  * [new branch]              gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T08:57:43.8745817Z  * [new branch]              gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T08:57:43.8747344Z  * [new branch]              gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T08:57:43.8748589Z  * [new branch]              gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T08:57:43.8749765Z  * [new branch]              gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T08:57:43.8751088Z  * [new branch]              gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T08:57:43.8752189Z  * [new branch]              gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T08:57:43.8753282Z  * [new branch]              gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T08:57:43.8755313Z  * [new branch]              gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T08:57:43.8756370Z  * [new branch]              gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T08:57:43.8757442Z  * [new branch]              gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T08:57:43.8759142Z  * [new branch]              gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T08:57:43.8760154Z  * [new branch]              gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T08:57:43.8761220Z  * [new branch]              gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T08:57:43.8762810Z  * [new branch]              gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T08:57:43.8763982Z  * [new branch]              gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T08:57:43.8765054Z  * [new branch]              gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T08:57:43.8766779Z  * [new branch]              gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T08:57:43.8767892Z  * [new branch]              gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T08:57:43.8769532Z  * [new branch]              gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T08:57:43.8771299Z  * [new branch]              gh/mlazos/41/base           -> origin/gh/mlazos/41/base
2025-12-04T08:57:43.8772379Z  * [new branch]              gh/mlazos/41/head           -> origin/gh/mlazos/41/head
2025-12-04T08:57:43.8773568Z  * [new branch]              gh/mlazos/41/orig           -> origin/gh/mlazos/41/orig
2025-12-04T08:57:43.8775054Z  * [new branch]              gh/mlazos/42/base           -> origin/gh/mlazos/42/base
2025-12-04T08:57:43.8776060Z  * [new branch]              gh/mlazos/42/head           -> origin/gh/mlazos/42/head
2025-12-04T08:57:43.8777608Z  * [new branch]              gh/mlazos/42/orig           -> origin/gh/mlazos/42/orig
2025-12-04T08:57:43.8778863Z  * [new branch]              gh/mlazos/43/base           -> origin/gh/mlazos/43/base
2025-12-04T08:57:43.8780027Z  * [new branch]              gh/mlazos/43/head           -> origin/gh/mlazos/43/head
2025-12-04T08:57:43.8781168Z  * [new branch]              gh/mlazos/43/orig           -> origin/gh/mlazos/43/orig
2025-12-04T08:57:43.8782526Z  * [new branch]              gh/mlazos/44/base           -> origin/gh/mlazos/44/base
2025-12-04T08:57:43.8783595Z  * [new branch]              gh/mlazos/44/head           -> origin/gh/mlazos/44/head
2025-12-04T08:57:43.8784690Z  * [new branch]              gh/mlazos/44/orig           -> origin/gh/mlazos/44/orig
2025-12-04T08:57:43.8786153Z  * [new branch]              gh/mlazos/47/base           -> origin/gh/mlazos/47/base
2025-12-04T08:57:43.8787220Z  * [new branch]              gh/mlazos/47/head           -> origin/gh/mlazos/47/head
2025-12-04T08:57:43.8788490Z  * [new branch]              gh/mlazos/47/orig           -> origin/gh/mlazos/47/orig
2025-12-04T08:57:43.8789919Z  * [new branch]              gh/mlazos/48/base           -> origin/gh/mlazos/48/base
2025-12-04T08:57:43.8790983Z  * [new branch]              gh/mlazos/48/head           -> origin/gh/mlazos/48/head
2025-12-04T08:57:43.8792122Z  * [new branch]              gh/mlazos/48/orig           -> origin/gh/mlazos/48/orig
2025-12-04T08:57:43.8793444Z  * [new branch]              gh/mlazos/49/base           -> origin/gh/mlazos/49/base
2025-12-04T08:57:43.8794524Z  * [new branch]              gh/mlazos/49/head           -> origin/gh/mlazos/49/head
2025-12-04T08:57:43.8795557Z  * [new branch]              gh/mlazos/49/orig           -> origin/gh/mlazos/49/orig
2025-12-04T08:57:43.8797026Z  * [new branch]              gh/mlazos/50/base           -> origin/gh/mlazos/50/base
2025-12-04T08:57:43.8798137Z  * [new branch]              gh/mlazos/50/head           -> origin/gh/mlazos/50/head
2025-12-04T08:57:43.8799151Z  * [new branch]              gh/mlazos/50/orig           -> origin/gh/mlazos/50/orig
2025-12-04T08:57:43.8800480Z  * [new branch]              gh/mlazos/51/base           -> origin/gh/mlazos/51/base
2025-12-04T08:57:43.8801559Z  * [new branch]              gh/mlazos/51/head           -> origin/gh/mlazos/51/head
2025-12-04T08:57:43.8802959Z  * [new branch]              gh/mlazos/51/orig           -> origin/gh/mlazos/51/orig
2025-12-04T08:57:43.8804266Z  * [new branch]              gh/mlazos/52/base           -> origin/gh/mlazos/52/base
2025-12-04T08:57:43.8805315Z  * [new branch]              gh/mlazos/52/head           -> origin/gh/mlazos/52/head
2025-12-04T08:57:43.8806385Z  * [new branch]              gh/mlazos/52/orig           -> origin/gh/mlazos/52/orig
2025-12-04T08:57:43.8807850Z  * [new branch]              gh/mlazos/53/base           -> origin/gh/mlazos/53/base
2025-12-04T08:57:43.8809003Z  * [new branch]              gh/mlazos/53/head           -> origin/gh/mlazos/53/head
2025-12-04T08:57:43.8810071Z  * [new branch]              gh/mlazos/53/orig           -> origin/gh/mlazos/53/orig
2025-12-04T08:57:43.8811470Z  * [new branch]              gh/mlazos/54/base           -> origin/gh/mlazos/54/base
2025-12-04T08:57:43.8812540Z  * [new branch]              gh/mlazos/54/head           -> origin/gh/mlazos/54/head
2025-12-04T08:57:43.8813644Z  * [new branch]              gh/mlazos/54/orig           -> origin/gh/mlazos/54/orig
2025-12-04T08:57:43.8814965Z  * [new branch]              gh/mlazos/55/base           -> origin/gh/mlazos/55/base
2025-12-04T08:57:43.8816034Z  * [new branch]              gh/mlazos/55/head           -> origin/gh/mlazos/55/head
2025-12-04T08:57:43.8818151Z  * [new branch]              gh/mlazos/55/orig           -> origin/gh/mlazos/55/orig
2025-12-04T08:57:43.8819617Z  * [new branch]              gh/mlazos/56/base           -> origin/gh/mlazos/56/base
2025-12-04T08:57:43.8821460Z  * [new branch]              gh/mlazos/56/head           -> origin/gh/mlazos/56/head
2025-12-04T08:57:43.8822604Z  * [new branch]              gh/mlazos/56/orig           -> origin/gh/mlazos/56/orig
2025-12-04T08:57:43.8824110Z  * [new branch]              gh/mlazos/57/base           -> origin/gh/mlazos/57/base
2025-12-04T08:57:43.8825174Z  * [new branch]              gh/mlazos/57/head           -> origin/gh/mlazos/57/head
2025-12-04T08:57:43.8826269Z  * [new branch]              gh/mlazos/57/orig           -> origin/gh/mlazos/57/orig
2025-12-04T08:57:43.8827739Z  * [new branch]              gh/mlazos/58/base           -> origin/gh/mlazos/58/base
2025-12-04T08:57:43.8828899Z  * [new branch]              gh/mlazos/58/head           -> origin/gh/mlazos/58/head
2025-12-04T08:57:43.8830091Z  * [new branch]              gh/mlazos/58/orig           -> origin/gh/mlazos/58/orig
2025-12-04T08:57:43.8831501Z  * [new branch]              gh/mlazos/59/base           -> origin/gh/mlazos/59/base
2025-12-04T08:57:43.8832648Z  * [new branch]              gh/mlazos/59/head           -> origin/gh/mlazos/59/head
2025-12-04T08:57:43.8834039Z  * [new branch]              gh/mlazos/59/orig           -> origin/gh/mlazos/59/orig
2025-12-04T08:57:43.8835526Z  * [new branch]              gh/mlazos/60/base           -> origin/gh/mlazos/60/base
2025-12-04T08:57:43.8836613Z  * [new branch]              gh/mlazos/60/head           -> origin/gh/mlazos/60/head
2025-12-04T08:57:43.8837801Z  * [new branch]              gh/mlazos/60/orig           -> origin/gh/mlazos/60/orig
2025-12-04T08:57:43.8839951Z  * [new branch]              gh/mlazos/61/base           -> origin/gh/mlazos/61/base
2025-12-04T08:57:43.8841089Z  * [new branch]              gh/mlazos/61/head           -> origin/gh/mlazos/61/head
2025-12-04T08:57:43.8842189Z  * [new branch]              gh/mlazos/61/orig           -> origin/gh/mlazos/61/orig
2025-12-04T08:57:43.8843669Z  * [new branch]              gh/mlazos/62/base           -> origin/gh/mlazos/62/base
2025-12-04T08:57:43.8844731Z  * [new branch]              gh/mlazos/62/head           -> origin/gh/mlazos/62/head
2025-12-04T08:57:43.8845871Z  * [new branch]              gh/mlazos/62/orig           -> origin/gh/mlazos/62/orig
2025-12-04T08:57:43.8847402Z  * [new branch]              gh/mlazos/63/base           -> origin/gh/mlazos/63/base
2025-12-04T08:57:43.8848553Z  * [new branch]              gh/mlazos/63/head           -> origin/gh/mlazos/63/head
2025-12-04T08:57:43.8849804Z  * [new branch]              gh/mlazos/63/orig           -> origin/gh/mlazos/63/orig
2025-12-04T08:57:43.8851177Z  * [new branch]              gh/mlazos/64/base           -> origin/gh/mlazos/64/base
2025-12-04T08:57:43.8852263Z  * [new branch]              gh/mlazos/64/head           -> origin/gh/mlazos/64/head
2025-12-04T08:57:43.8853314Z  * [new branch]              gh/mlazos/64/orig           -> origin/gh/mlazos/64/orig
2025-12-04T08:57:43.8854776Z  * [new branch]              gh/mlazos/65/base           -> origin/gh/mlazos/65/base
2025-12-04T08:57:43.8855843Z  * [new branch]              gh/mlazos/65/head           -> origin/gh/mlazos/65/head
2025-12-04T08:57:43.8857272Z  * [new branch]              gh/mlazos/65/orig           -> origin/gh/mlazos/65/orig
2025-12-04T08:57:43.8858763Z  * [new branch]              gh/mlazos/66/base           -> origin/gh/mlazos/66/base
2025-12-04T08:57:43.8859867Z  * [new branch]              gh/mlazos/66/head           -> origin/gh/mlazos/66/head
2025-12-04T08:57:43.8861039Z  * [new branch]              gh/mlazos/66/orig           -> origin/gh/mlazos/66/orig
2025-12-04T08:57:43.8862543Z  * [new branch]              gh/mlazos/67/base           -> origin/gh/mlazos/67/base
2025-12-04T08:57:43.8863649Z  * [new branch]              gh/mlazos/67/head           -> origin/gh/mlazos/67/head
2025-12-04T08:57:43.8864863Z  * [new branch]              gh/mlazos/67/orig           -> origin/gh/mlazos/67/orig
2025-12-04T08:57:43.8866349Z  * [new branch]              gh/mlazos/68/base           -> origin/gh/mlazos/68/base
2025-12-04T08:57:43.8867452Z  * [new branch]              gh/mlazos/68/head           -> origin/gh/mlazos/68/head
2025-12-04T08:57:43.8868554Z  * [new branch]              gh/mlazos/68/orig           -> origin/gh/mlazos/68/orig
2025-12-04T08:57:43.8870176Z  * [new branch]              gh/mlazos/69/base           -> origin/gh/mlazos/69/base
2025-12-04T08:57:43.8873176Z  * [new branch]              gh/mlazos/69/head           -> origin/gh/mlazos/69/head
2025-12-04T08:57:43.8873420Z  * [new branch]              gh/mlazos/69/orig           -> origin/gh/mlazos/69/orig
2025-12-04T08:57:43.8874691Z  * [new branch]              gh/mlazos/70/base           -> origin/gh/mlazos/70/base
2025-12-04T08:57:43.8875257Z  * [new branch]              gh/mlazos/70/head           -> origin/gh/mlazos/70/head
2025-12-04T08:57:43.8876440Z  * [new branch]              gh/mlazos/70/orig           -> origin/gh/mlazos/70/orig
2025-12-04T08:57:43.8877966Z  * [new branch]              gh/mlazos/71/base           -> origin/gh/mlazos/71/base
2025-12-04T08:57:43.8879088Z  * [new branch]              gh/mlazos/71/head           -> origin/gh/mlazos/71/head
2025-12-04T08:57:43.8880271Z  * [new branch]              gh/mlazos/71/orig           -> origin/gh/mlazos/71/orig
2025-12-04T08:57:43.8881715Z  * [new branch]              gh/mlazos/72/base           -> origin/gh/mlazos/72/base
2025-12-04T08:57:43.8882786Z  * [new branch]              gh/mlazos/72/head           -> origin/gh/mlazos/72/head
2025-12-04T08:57:43.8883998Z  * [new branch]              gh/mlazos/72/orig           -> origin/gh/mlazos/72/orig
2025-12-04T08:57:43.8885435Z  * [new branch]              gh/mlazos/73/base           -> origin/gh/mlazos/73/base
2025-12-04T08:57:43.8886552Z  * [new branch]              gh/mlazos/73/head           -> origin/gh/mlazos/73/head
2025-12-04T08:57:43.8887714Z  * [new branch]              gh/mlazos/73/orig           -> origin/gh/mlazos/73/orig
2025-12-04T08:57:43.8889418Z  * [new branch]              gh/mrmiywj/1/base           -> origin/gh/mrmiywj/1/base
2025-12-04T08:57:43.8890603Z  * [new branch]              gh/mrmiywj/1/head           -> origin/gh/mrmiywj/1/head
2025-12-04T08:57:43.8892405Z  * [new branch]              gh/muchulee8/73/base        -> origin/gh/muchulee8/73/base
2025-12-04T08:57:43.8893744Z  * [new branch]              gh/muchulee8/73/head        -> origin/gh/muchulee8/73/head
2025-12-04T08:57:43.8894930Z  * [new branch]              gh/muchulee8/73/orig        -> origin/gh/muchulee8/73/orig
2025-12-04T08:57:43.8897031Z  * [new branch]              gh/naveenthangudu/1/base    -> origin/gh/naveenthangudu/1/base
2025-12-04T08:57:43.8898281Z  * [new branch]              gh/naveenthangudu/1/head    -> origin/gh/naveenthangudu/1/head
2025-12-04T08:57:43.8899555Z  * [new branch]              gh/naveenthangudu/1/orig    -> origin/gh/naveenthangudu/1/orig
2025-12-04T08:57:43.8901266Z  * [new branch]              gh/naveenthangudu/2/base    -> origin/gh/naveenthangudu/2/base
2025-12-04T08:57:43.8902397Z  * [new branch]              gh/naveenthangudu/2/head    -> origin/gh/naveenthangudu/2/head
2025-12-04T08:57:43.8903562Z  * [new branch]              gh/naveenthangudu/2/orig    -> origin/gh/naveenthangudu/2/orig
2025-12-04T08:57:43.8904990Z  * [new branch]              gh/naveenthangudu/3/base    -> origin/gh/naveenthangudu/3/base
2025-12-04T08:57:43.8906100Z  * [new branch]              gh/naveenthangudu/3/head    -> origin/gh/naveenthangudu/3/head
2025-12-04T08:57:43.8907304Z  * [new branch]              gh/naveenthangudu/3/orig    -> origin/gh/naveenthangudu/3/orig
2025-12-04T08:57:43.8908965Z  * [new branch]              gh/naveenthangudu/4/base    -> origin/gh/naveenthangudu/4/base
2025-12-04T08:57:43.8910059Z  * [new branch]              gh/naveenthangudu/4/head    -> origin/gh/naveenthangudu/4/head
2025-12-04T08:57:43.8911246Z  * [new branch]              gh/naveenthangudu/4/orig    -> origin/gh/naveenthangudu/4/orig
2025-12-04T08:57:43.8912672Z  * [new branch]              gh/naveenthangudu/5/base    -> origin/gh/naveenthangudu/5/base
2025-12-04T08:57:43.8913756Z  * [new branch]              gh/naveenthangudu/5/head    -> origin/gh/naveenthangudu/5/head
2025-12-04T08:57:43.8915165Z  * [new branch]              gh/naveenthangudu/5/orig    -> origin/gh/naveenthangudu/5/orig
2025-12-04T08:57:43.8916631Z  * [new branch]              gh/naveenthangudu/6/base    -> origin/gh/naveenthangudu/6/base
2025-12-04T08:57:43.8917688Z  * [new branch]              gh/naveenthangudu/6/head    -> origin/gh/naveenthangudu/6/head
2025-12-04T08:57:43.8918719Z  * [new branch]              gh/naveenthangudu/6/orig    -> origin/gh/naveenthangudu/6/orig
2025-12-04T08:57:43.8920105Z  * [new branch]              gh/naveenthangudu/7/base    -> origin/gh/naveenthangudu/7/base
2025-12-04T08:57:43.8921539Z  * [new branch]              gh/naveenthangudu/7/head    -> origin/gh/naveenthangudu/7/head
2025-12-04T08:57:43.8922715Z  * [new branch]              gh/naveenthangudu/7/orig    -> origin/gh/naveenthangudu/7/orig
2025-12-04T08:57:43.8924182Z  * [new branch]              gh/naveenthangudu/8/base    -> origin/gh/naveenthangudu/8/base
2025-12-04T08:57:43.8925325Z  * [new branch]              gh/naveenthangudu/8/head    -> origin/gh/naveenthangudu/8/head
2025-12-04T08:57:43.8926528Z  * [new branch]              gh/naveenthangudu/8/orig    -> origin/gh/naveenthangudu/8/orig
2025-12-04T08:57:43.8927999Z  * [new branch]              gh/naveenthangudu/9/base    -> origin/gh/naveenthangudu/9/base
2025-12-04T08:57:43.8929110Z  * [new branch]              gh/naveenthangudu/9/head    -> origin/gh/naveenthangudu/9/head
2025-12-04T08:57:43.8930383Z  * [new branch]              gh/naveenthangudu/9/orig    -> origin/gh/naveenthangudu/9/orig
2025-12-04T08:57:43.8932020Z  * [new branch]              gh/nikitaved/1/base         -> origin/gh/nikitaved/1/base
2025-12-04T08:57:43.8933205Z  * [new branch]              gh/nikitaved/1/head         -> origin/gh/nikitaved/1/head
2025-12-04T08:57:43.8934390Z  * [new branch]              gh/nikitaved/1/orig         -> origin/gh/nikitaved/1/orig
2025-12-04T08:57:43.8935894Z  * [new branch]              gh/nikitaved/10/base        -> origin/gh/nikitaved/10/base
2025-12-04T08:57:43.8937307Z  * [new branch]              gh/nikitaved/10/head        -> origin/gh/nikitaved/10/head
2025-12-04T08:57:43.8938448Z  * [new branch]              gh/nikitaved/10/orig        -> origin/gh/nikitaved/10/orig
2025-12-04T08:57:43.8939958Z  * [new branch]              gh/nikitaved/11/base        -> origin/gh/nikitaved/11/base
2025-12-04T08:57:43.8941158Z  * [new branch]              gh/nikitaved/11/head        -> origin/gh/nikitaved/11/head
2025-12-04T08:57:43.8942334Z  * [new branch]              gh/nikitaved/11/orig        -> origin/gh/nikitaved/11/orig
2025-12-04T08:57:43.8943750Z  * [new branch]              gh/nikitaved/12/base        -> origin/gh/nikitaved/12/base
2025-12-04T08:57:43.8944873Z  * [new branch]              gh/nikitaved/12/head        -> origin/gh/nikitaved/12/head
2025-12-04T08:57:43.8945986Z  * [new branch]              gh/nikitaved/12/orig        -> origin/gh/nikitaved/12/orig
2025-12-04T08:57:43.8947458Z  * [new branch]              gh/nikitaved/13/base        -> origin/gh/nikitaved/13/base
2025-12-04T08:57:43.8948693Z  * [new branch]              gh/nikitaved/13/head        -> origin/gh/nikitaved/13/head
2025-12-04T08:57:43.8949789Z  * [new branch]              gh/nikitaved/13/orig        -> origin/gh/nikitaved/13/orig
2025-12-04T08:57:43.8951253Z  * [new branch]              gh/nikitaved/14/base        -> origin/gh/nikitaved/14/base
2025-12-04T08:57:43.8952447Z  * [new branch]              gh/nikitaved/14/head        -> origin/gh/nikitaved/14/head
2025-12-04T08:57:43.8953560Z  * [new branch]              gh/nikitaved/14/orig        -> origin/gh/nikitaved/14/orig
2025-12-04T08:57:43.8954993Z  * [new branch]              gh/nikitaved/15/base        -> origin/gh/nikitaved/15/base
2025-12-04T08:57:43.8956107Z  * [new branch]              gh/nikitaved/15/head        -> origin/gh/nikitaved/15/head
2025-12-04T08:57:43.8957187Z  * [new branch]              gh/nikitaved/15/orig        -> origin/gh/nikitaved/15/orig
2025-12-04T08:57:43.8958632Z  * [new branch]              gh/nikitaved/16/base        -> origin/gh/nikitaved/16/base
2025-12-04T08:57:43.8959696Z  * [new branch]              gh/nikitaved/16/head        -> origin/gh/nikitaved/16/head
2025-12-04T08:57:43.8960792Z  * [new branch]              gh/nikitaved/16/orig        -> origin/gh/nikitaved/16/orig
2025-12-04T08:57:43.8962689Z  * [new branch]              gh/nikitaved/2/base         -> origin/gh/nikitaved/2/base
2025-12-04T08:57:43.8963785Z  * [new branch]              gh/nikitaved/2/head         -> origin/gh/nikitaved/2/head
2025-12-04T08:57:43.8964893Z  * [new branch]              gh/nikitaved/2/orig         -> origin/gh/nikitaved/2/orig
2025-12-04T08:57:43.8966896Z  * [new branch]              gh/nikitaved/4/base         -> origin/gh/nikitaved/4/base
2025-12-04T08:57:43.8968001Z  * [new branch]              gh/nikitaved/4/head         -> origin/gh/nikitaved/4/head
2025-12-04T08:57:43.8969074Z  * [new branch]              gh/nikitaved/4/orig         -> origin/gh/nikitaved/4/orig
2025-12-04T08:57:43.8970699Z  * [new branch]              gh/nikitaved/5/base         -> origin/gh/nikitaved/5/base
2025-12-04T08:57:43.8971780Z  * [new branch]              gh/nikitaved/5/head         -> origin/gh/nikitaved/5/head
2025-12-04T08:57:43.8972852Z  * [new branch]              gh/nikitaved/5/orig         -> origin/gh/nikitaved/5/orig
2025-12-04T08:57:43.8974258Z  * [new branch]              gh/nikitaved/6/base         -> origin/gh/nikitaved/6/base
2025-12-04T08:57:43.8975408Z  * [new branch]              gh/nikitaved/6/head         -> origin/gh/nikitaved/6/head
2025-12-04T08:57:43.8976480Z  * [new branch]              gh/nikitaved/6/orig         -> origin/gh/nikitaved/6/orig
2025-12-04T08:57:43.8978273Z  * [new branch]              gh/nikitaved/8/base         -> origin/gh/nikitaved/8/base
2025-12-04T08:57:43.8979446Z  * [new branch]              gh/nikitaved/8/head         -> origin/gh/nikitaved/8/head
2025-12-04T08:57:43.8980567Z  * [new branch]              gh/nikitaved/8/orig         -> origin/gh/nikitaved/8/orig
2025-12-04T08:57:43.8982032Z  * [new branch]              gh/nikitaved/9/base         -> origin/gh/nikitaved/9/base
2025-12-04T08:57:43.8983150Z  * [new branch]              gh/nikitaved/9/head         -> origin/gh/nikitaved/9/head
2025-12-04T08:57:43.8984236Z  * [new branch]              gh/nikitaved/9/orig         -> origin/gh/nikitaved/9/orig
2025-12-04T08:57:43.8986059Z  * [new branch]              gh/oulgen/10/base           -> origin/gh/oulgen/10/base
2025-12-04T08:57:43.8987146Z  * [new branch]              gh/oulgen/10/head           -> origin/gh/oulgen/10/head
2025-12-04T08:57:43.8988303Z  * [new branch]              gh/oulgen/10/orig           -> origin/gh/oulgen/10/orig
2025-12-04T08:57:43.8989894Z  * [new branch]              gh/oulgen/11/base           -> origin/gh/oulgen/11/base
2025-12-04T08:57:43.8990977Z  * [new branch]              gh/oulgen/11/head           -> origin/gh/oulgen/11/head
2025-12-04T08:57:43.8992134Z  * [new branch]              gh/oulgen/11/orig           -> origin/gh/oulgen/11/orig
2025-12-04T08:57:43.8993589Z  * [new branch]              gh/oulgen/12/base           -> origin/gh/oulgen/12/base
2025-12-04T08:57:43.8994603Z  * [new branch]              gh/oulgen/12/head           -> origin/gh/oulgen/12/head
2025-12-04T08:57:43.8995649Z  * [new branch]              gh/oulgen/12/orig           -> origin/gh/oulgen/12/orig
2025-12-04T08:57:43.8997103Z  * [new branch]              gh/oulgen/13/base           -> origin/gh/oulgen/13/base
2025-12-04T08:57:43.8998193Z  * [new branch]              gh/oulgen/13/head           -> origin/gh/oulgen/13/head
2025-12-04T08:57:43.8999333Z  * [new branch]              gh/oulgen/13/orig           -> origin/gh/oulgen/13/orig
2025-12-04T08:57:43.9000761Z  * [new branch]              gh/oulgen/14/base           -> origin/gh/oulgen/14/base
2025-12-04T08:57:43.9001820Z  * [new branch]              gh/oulgen/14/head           -> origin/gh/oulgen/14/head
2025-12-04T08:57:43.9002984Z  * [new branch]              gh/oulgen/14/orig           -> origin/gh/oulgen/14/orig
2025-12-04T08:57:43.9004377Z  * [new branch]              gh/oulgen/15/base           -> origin/gh/oulgen/15/base
2025-12-04T08:57:43.9005459Z  * [new branch]              gh/oulgen/15/head           -> origin/gh/oulgen/15/head
2025-12-04T08:57:43.9006601Z  * [new branch]              gh/oulgen/15/orig           -> origin/gh/oulgen/15/orig
2025-12-04T08:57:43.9008024Z  * [new branch]              gh/oulgen/16/base           -> origin/gh/oulgen/16/base
2025-12-04T08:57:43.9009078Z  * [new branch]              gh/oulgen/16/head           -> origin/gh/oulgen/16/head
2025-12-04T08:57:43.9010307Z  * [new branch]              gh/oulgen/16/orig           -> origin/gh/oulgen/16/orig
2025-12-04T08:57:43.9011562Z  * [new branch]              gh/oulgen/17/base           -> origin/gh/oulgen/17/base
2025-12-04T08:57:43.9012653Z  * [new branch]              gh/oulgen/17/head           -> origin/gh/oulgen/17/head
2025-12-04T08:57:43.9013782Z  * [new branch]              gh/oulgen/17/orig           -> origin/gh/oulgen/17/orig
2025-12-04T08:57:43.9015220Z  * [new branch]              gh/oulgen/18/base           -> origin/gh/oulgen/18/base
2025-12-04T08:57:43.9016406Z  * [new branch]              gh/oulgen/18/head           -> origin/gh/oulgen/18/head
2025-12-04T08:57:43.9017951Z  * [new branch]              gh/oulgen/18/orig           -> origin/gh/oulgen/18/orig
2025-12-04T08:57:43.9019276Z  * [new branch]              gh/oulgen/19/base           -> origin/gh/oulgen/19/base
2025-12-04T08:57:43.9020320Z  * [new branch]              gh/oulgen/19/head           -> origin/gh/oulgen/19/head
2025-12-04T08:57:43.9023912Z  * [new branch]              gh/oulgen/19/orig           -> origin/gh/oulgen/19/orig
2025-12-04T08:57:43.9025534Z  * [new branch]              gh/oulgen/20/base           -> origin/gh/oulgen/20/base
2025-12-04T08:57:43.9026702Z  * [new branch]              gh/oulgen/20/head           -> origin/gh/oulgen/20/head
2025-12-04T08:57:43.9027852Z  * [new branch]              gh/oulgen/20/orig           -> origin/gh/oulgen/20/orig
2025-12-04T08:57:43.9029265Z  * [new branch]              gh/oulgen/21/base           -> origin/gh/oulgen/21/base
2025-12-04T08:57:43.9031129Z  * [new branch]              gh/oulgen/21/head           -> origin/gh/oulgen/21/head
2025-12-04T08:57:43.9032185Z  * [new branch]              gh/oulgen/21/orig           -> origin/gh/oulgen/21/orig
2025-12-04T08:57:43.9033726Z  * [new branch]              gh/oulgen/22/base           -> origin/gh/oulgen/22/base
2025-12-04T08:57:43.9034829Z  * [new branch]              gh/oulgen/22/head           -> origin/gh/oulgen/22/head
2025-12-04T08:57:43.9035902Z  * [new branch]              gh/oulgen/22/orig           -> origin/gh/oulgen/22/orig
2025-12-04T08:57:43.9037313Z  * [new branch]              gh/oulgen/23/base           -> origin/gh/oulgen/23/base
2025-12-04T08:57:43.9038446Z  * [new branch]              gh/oulgen/23/head           -> origin/gh/oulgen/23/head
2025-12-04T08:57:43.9039507Z  * [new branch]              gh/oulgen/23/orig           -> origin/gh/oulgen/23/orig
2025-12-04T08:57:43.9040843Z  * [new branch]              gh/oulgen/24/base           -> origin/gh/oulgen/24/base
2025-12-04T08:57:43.9041884Z  * [new branch]              gh/oulgen/24/head           -> origin/gh/oulgen/24/head
2025-12-04T08:57:43.9042998Z  * [new branch]              gh/oulgen/24/orig           -> origin/gh/oulgen/24/orig
2025-12-04T08:57:43.9044420Z  * [new branch]              gh/oulgen/25/base           -> origin/gh/oulgen/25/base
2025-12-04T08:57:43.9045456Z  * [new branch]              gh/oulgen/25/head           -> origin/gh/oulgen/25/head
2025-12-04T08:57:43.9046691Z  * [new branch]              gh/oulgen/25/orig           -> origin/gh/oulgen/25/orig
2025-12-04T08:57:43.9048085Z  * [new branch]              gh/oulgen/26/base           -> origin/gh/oulgen/26/base
2025-12-04T08:57:43.9049157Z  * [new branch]              gh/oulgen/26/head           -> origin/gh/oulgen/26/head
2025-12-04T08:57:43.9050222Z  * [new branch]              gh/oulgen/26/orig           -> origin/gh/oulgen/26/orig
2025-12-04T08:57:43.9051622Z  * [new branch]              gh/oulgen/4/base            -> origin/gh/oulgen/4/base
2025-12-04T08:57:43.9052716Z  * [new branch]              gh/oulgen/4/head            -> origin/gh/oulgen/4/head
2025-12-04T08:57:43.9053822Z  * [new branch]              gh/oulgen/4/orig            -> origin/gh/oulgen/4/orig
2025-12-04T08:57:43.9055706Z  * [new branch]              gh/oulgen/7/base            -> origin/gh/oulgen/7/base
2025-12-04T08:57:43.9057084Z  * [new branch]              gh/oulgen/7/head            -> origin/gh/oulgen/7/head
2025-12-04T08:57:43.9058250Z  * [new branch]              gh/oulgen/7/orig            -> origin/gh/oulgen/7/orig
2025-12-04T08:57:43.9059799Z  * [new branch]              gh/oulgen/8/base            -> origin/gh/oulgen/8/base
2025-12-04T08:57:43.9060923Z  * [new branch]              gh/oulgen/8/head            -> origin/gh/oulgen/8/head
2025-12-04T08:57:43.9062174Z  * [new branch]              gh/oulgen/8/orig            -> origin/gh/oulgen/8/orig
2025-12-04T08:57:43.9063590Z  * [new branch]              gh/oulgen/9/base            -> origin/gh/oulgen/9/base
2025-12-04T08:57:43.9064689Z  * [new branch]              gh/oulgen/9/head            -> origin/gh/oulgen/9/head
2025-12-04T08:57:43.9065863Z  * [new branch]              gh/oulgen/9/orig            -> origin/gh/oulgen/9/orig
2025-12-04T08:57:43.9067563Z  * [new branch]              gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T08:57:43.9069464Z  * [new branch]              gh/pearu/108/base           -> origin/gh/pearu/108/base
2025-12-04T08:57:43.9070555Z  * [new branch]              gh/pearu/108/head           -> origin/gh/pearu/108/head
2025-12-04T08:57:43.9071769Z  * [new branch]              gh/pearu/108/orig           -> origin/gh/pearu/108/orig
2025-12-04T08:57:43.9073237Z  * [new branch]              gh/pearu/109/base           -> origin/gh/pearu/109/base
2025-12-04T08:57:43.9074373Z  * [new branch]              gh/pearu/109/head           -> origin/gh/pearu/109/head
2025-12-04T08:57:43.9075471Z  * [new branch]              gh/pearu/109/orig           -> origin/gh/pearu/109/orig
2025-12-04T08:57:43.9077060Z  * [new branch]              gh/pearu/110/base           -> origin/gh/pearu/110/base
2025-12-04T08:57:43.9078166Z  * [new branch]              gh/pearu/110/head           -> origin/gh/pearu/110/head
2025-12-04T08:57:43.9079293Z  * [new branch]              gh/pearu/110/orig           -> origin/gh/pearu/110/orig
2025-12-04T08:57:43.9080700Z  * [new branch]              gh/pearu/111/base           -> origin/gh/pearu/111/base
2025-12-04T08:57:43.9081763Z  * [new branch]              gh/pearu/111/head           -> origin/gh/pearu/111/head
2025-12-04T08:57:43.9082911Z  * [new branch]              gh/pearu/111/orig           -> origin/gh/pearu/111/orig
2025-12-04T08:57:43.9084387Z  * [new branch]              gh/pearu/112/base           -> origin/gh/pearu/112/base
2025-12-04T08:57:43.9085432Z  * [new branch]              gh/pearu/112/head           -> origin/gh/pearu/112/head
2025-12-04T08:57:43.9086506Z  * [new branch]              gh/pearu/112/orig           -> origin/gh/pearu/112/orig
2025-12-04T08:57:43.9087857Z  * [new branch]              gh/pearu/115/base           -> origin/gh/pearu/115/base
2025-12-04T08:57:43.9089012Z  * [new branch]              gh/pearu/115/head           -> origin/gh/pearu/115/head
2025-12-04T08:57:43.9090063Z  * [new branch]              gh/pearu/115/orig           -> origin/gh/pearu/115/orig
2025-12-04T08:57:43.9091486Z  * [new branch]              gh/pearu/116/base           -> origin/gh/pearu/116/base
2025-12-04T08:57:43.9092584Z  * [new branch]              gh/pearu/116/head           -> origin/gh/pearu/116/head
2025-12-04T08:57:43.9093677Z  * [new branch]              gh/pearu/116/orig           -> origin/gh/pearu/116/orig
2025-12-04T08:57:43.9095062Z  * [new branch]              gh/pearu/117/base           -> origin/gh/pearu/117/base
2025-12-04T08:57:43.9096322Z  * [new branch]              gh/pearu/117/head           -> origin/gh/pearu/117/head
2025-12-04T08:57:43.9097783Z  * [new branch]              gh/pearu/117/orig           -> origin/gh/pearu/117/orig
2025-12-04T08:57:43.9099300Z  * [new branch]              gh/pearu/118/base           -> origin/gh/pearu/118/base
2025-12-04T08:57:43.9100404Z  * [new branch]              gh/pearu/118/head           -> origin/gh/pearu/118/head
2025-12-04T08:57:43.9101596Z  * [new branch]              gh/pearu/118/orig           -> origin/gh/pearu/118/orig
2025-12-04T08:57:43.9103036Z  * [new branch]              gh/pearu/119/base           -> origin/gh/pearu/119/base
2025-12-04T08:57:43.9104146Z  * [new branch]              gh/pearu/119/head           -> origin/gh/pearu/119/head
2025-12-04T08:57:43.9105252Z  * [new branch]              gh/pearu/119/orig           -> origin/gh/pearu/119/orig
2025-12-04T08:57:43.9107302Z  * [new branch]              gh/pearu/139/base           -> origin/gh/pearu/139/base
2025-12-04T08:57:43.9108437Z  * [new branch]              gh/pearu/139/head           -> origin/gh/pearu/139/head
2025-12-04T08:57:43.9109648Z  * [new branch]              gh/pearu/139/orig           -> origin/gh/pearu/139/orig
2025-12-04T08:57:43.9111119Z  * [new branch]              gh/pearu/140/base           -> origin/gh/pearu/140/base
2025-12-04T08:57:43.9112217Z  * [new branch]              gh/pearu/140/head           -> origin/gh/pearu/140/head
2025-12-04T08:57:43.9113399Z  * [new branch]              gh/pearu/140/orig           -> origin/gh/pearu/140/orig
2025-12-04T08:57:43.9114763Z  * [new branch]              gh/pearu/142/base           -> origin/gh/pearu/142/base
2025-12-04T08:57:43.9115837Z  * [new branch]              gh/pearu/142/head           -> origin/gh/pearu/142/head
2025-12-04T08:57:43.9116904Z  * [new branch]              gh/pearu/142/orig           -> origin/gh/pearu/142/orig
2025-12-04T08:57:43.9118312Z  * [new branch]              gh/pearu/143/base           -> origin/gh/pearu/143/base
2025-12-04T08:57:43.9119387Z  * [new branch]              gh/pearu/143/head           -> origin/gh/pearu/143/head
2025-12-04T08:57:43.9120470Z  * [new branch]              gh/pearu/143/orig           -> origin/gh/pearu/143/orig
2025-12-04T08:57:43.9122554Z  * [new branch]              gh/pearu/147/base           -> origin/gh/pearu/147/base
2025-12-04T08:57:43.9123650Z  * [new branch]              gh/pearu/147/head           -> origin/gh/pearu/147/head
2025-12-04T08:57:43.9124773Z  * [new branch]              gh/pearu/147/orig           -> origin/gh/pearu/147/orig
2025-12-04T08:57:43.9126258Z  * [new branch]              gh/pearu/149/base           -> origin/gh/pearu/149/base
2025-12-04T08:57:43.9127361Z  * [new branch]              gh/pearu/149/head           -> origin/gh/pearu/149/head
2025-12-04T08:57:43.9128468Z  * [new branch]              gh/pearu/149/orig           -> origin/gh/pearu/149/orig
2025-12-04T08:57:43.9130378Z  * [new branch]              gh/pearu/150/base           -> origin/gh/pearu/150/base
2025-12-04T08:57:43.9131542Z  * [new branch]              gh/pearu/150/head           -> origin/gh/pearu/150/head
2025-12-04T08:57:43.9132646Z  * [new branch]              gh/pearu/150/orig           -> origin/gh/pearu/150/orig
2025-12-04T08:57:43.9134249Z  * [new branch]              gh/pearu/151/base           -> origin/gh/pearu/151/base
2025-12-04T08:57:43.9135342Z  * [new branch]              gh/pearu/151/head           -> origin/gh/pearu/151/head
2025-12-04T08:57:43.9136506Z  * [new branch]              gh/pearu/151/orig           -> origin/gh/pearu/151/orig
2025-12-04T08:57:43.9138442Z  * [new branch]              gh/pearu/152/base           -> origin/gh/pearu/152/base
2025-12-04T08:57:43.9139616Z  * [new branch]              gh/pearu/152/head           -> origin/gh/pearu/152/head
2025-12-04T08:57:43.9140738Z  * [new branch]              gh/pearu/152/orig           -> origin/gh/pearu/152/orig
2025-12-04T08:57:43.9142682Z  * [new branch]              gh/pearu/153/base           -> origin/gh/pearu/153/base
2025-12-04T08:57:43.9143798Z  * [new branch]              gh/pearu/153/head           -> origin/gh/pearu/153/head
2025-12-04T08:57:43.9144997Z  * [new branch]              gh/pearu/153/orig           -> origin/gh/pearu/153/orig
2025-12-04T08:57:43.9146417Z  * [new branch]              gh/pearu/154/base           -> origin/gh/pearu/154/base
2025-12-04T08:57:43.9148019Z  * [new branch]              gh/pearu/154/head           -> origin/gh/pearu/154/head
2025-12-04T08:57:43.9149273Z  * [new branch]              gh/pearu/154/orig           -> origin/gh/pearu/154/orig
2025-12-04T08:57:43.9150816Z  * [new branch]              gh/pearu/155/base           -> origin/gh/pearu/155/base
2025-12-04T08:57:43.9151900Z  * [new branch]              gh/pearu/155/head           -> origin/gh/pearu/155/head
2025-12-04T08:57:43.9153027Z  * [new branch]              gh/pearu/155/orig           -> origin/gh/pearu/155/orig
2025-12-04T08:57:43.9154589Z  * [new branch]              gh/pearu/156/base           -> origin/gh/pearu/156/base
2025-12-04T08:57:43.9155687Z  * [new branch]              gh/pearu/156/head           -> origin/gh/pearu/156/head
2025-12-04T08:57:43.9156752Z  * [new branch]              gh/pearu/156/orig           -> origin/gh/pearu/156/orig
2025-12-04T08:57:43.9158662Z  * [new branch]              gh/pearu/56/base            -> origin/gh/pearu/56/base
2025-12-04T08:57:43.9160453Z  * [new branch]              gh/pearu/56/head            -> origin/gh/pearu/56/head
2025-12-04T08:57:43.9161743Z  * [new branch]              gh/pearu/56/orig            -> origin/gh/pearu/56/orig
2025-12-04T08:57:43.9163297Z  * [new branch]              gh/pearu/97/base            -> origin/gh/pearu/97/base
2025-12-04T08:57:43.9164558Z  * [new branch]              gh/pearu/97/head            -> origin/gh/pearu/97/head
2025-12-04T08:57:43.9165637Z  * [new branch]              gh/pearu/97/orig            -> origin/gh/pearu/97/orig
2025-12-04T08:57:43.9167400Z  * [new branch]              gh/pianpwk/21/base          -> origin/gh/pianpwk/21/base
2025-12-04T08:57:43.9168448Z  * [new branch]              gh/pianpwk/21/head          -> origin/gh/pianpwk/21/head
2025-12-04T08:57:43.9170007Z  * [new branch]              gh/pianpwk/28/base          -> origin/gh/pianpwk/28/base
2025-12-04T08:57:43.9171075Z  * [new branch]              gh/pianpwk/28/head          -> origin/gh/pianpwk/28/head
2025-12-04T08:57:43.9172180Z  * [new branch]              gh/pianpwk/28/orig          -> origin/gh/pianpwk/28/orig
2025-12-04T08:57:43.9173643Z  * [new branch]              gh/pianpwk/29/base          -> origin/gh/pianpwk/29/base
2025-12-04T08:57:43.9174886Z  * [new branch]              gh/pianpwk/29/head          -> origin/gh/pianpwk/29/head
2025-12-04T08:57:43.9176010Z  * [new branch]              gh/pianpwk/29/orig          -> origin/gh/pianpwk/29/orig
2025-12-04T08:57:43.9178030Z  * [new branch]              gh/pianpwk/30/base          -> origin/gh/pianpwk/30/base
2025-12-04T08:57:43.9179063Z  * [new branch]              gh/pianpwk/30/head          -> origin/gh/pianpwk/30/head
2025-12-04T08:57:43.9180242Z  * [new branch]              gh/pianpwk/30/orig          -> origin/gh/pianpwk/30/orig
2025-12-04T08:57:43.9181772Z  * [new branch]              gh/pianpwk/31/base          -> origin/gh/pianpwk/31/base
2025-12-04T08:57:43.9182937Z  * [new branch]              gh/pianpwk/31/head          -> origin/gh/pianpwk/31/head
2025-12-04T08:57:43.9184049Z  * [new branch]              gh/pianpwk/31/orig          -> origin/gh/pianpwk/31/orig
2025-12-04T08:57:43.9185482Z  * [new branch]              gh/pianpwk/32/base          -> origin/gh/pianpwk/32/base
2025-12-04T08:57:43.9186607Z  * [new branch]              gh/pianpwk/32/head          -> origin/gh/pianpwk/32/head
2025-12-04T08:57:43.9187716Z  * [new branch]              gh/pianpwk/32/orig          -> origin/gh/pianpwk/32/orig
2025-12-04T08:57:43.9189158Z  * [new branch]              gh/pianpwk/33/base          -> origin/gh/pianpwk/33/base
2025-12-04T08:57:43.9190235Z  * [new branch]              gh/pianpwk/33/head          -> origin/gh/pianpwk/33/head
2025-12-04T08:57:43.9191334Z  * [new branch]              gh/pianpwk/33/orig          -> origin/gh/pianpwk/33/orig
2025-12-04T08:57:43.9193123Z  * [new branch]              gh/pianpwk/34/base          -> origin/gh/pianpwk/34/base
2025-12-04T08:57:43.9194549Z  * [new branch]              gh/pianpwk/34/head          -> origin/gh/pianpwk/34/head
2025-12-04T08:57:43.9195791Z  * [new branch]              gh/pianpwk/34/orig          -> origin/gh/pianpwk/34/orig
2025-12-04T08:57:43.9197252Z  * [new branch]              gh/pianpwk/35/base          -> origin/gh/pianpwk/35/base
2025-12-04T08:57:43.9198399Z  * [new branch]              gh/pianpwk/35/head          -> origin/gh/pianpwk/35/head
2025-12-04T08:57:43.9199634Z  * [new branch]              gh/pianpwk/35/orig          -> origin/gh/pianpwk/35/orig
2025-12-04T08:57:43.9201449Z  * [new branch]              gh/rec/141/base             -> origin/gh/rec/141/base
2025-12-04T08:57:43.9202550Z  * [new branch]              gh/rec/141/head             -> origin/gh/rec/141/head
2025-12-04T08:57:43.9203944Z  * [new branch]              gh/rec/153/base             -> origin/gh/rec/153/base
2025-12-04T08:57:43.9205026Z  * [new branch]              gh/rec/153/head             -> origin/gh/rec/153/head
2025-12-04T08:57:43.9206095Z  * [new branch]              gh/rec/153/orig             -> origin/gh/rec/153/orig
2025-12-04T08:57:43.9207538Z  * [new branch]              gh/rec/154/base             -> origin/gh/rec/154/base
2025-12-04T08:57:43.9208808Z  * [new branch]              gh/rec/154/head             -> origin/gh/rec/154/head
2025-12-04T08:57:43.9209700Z  * [new branch]              gh/rec/154/orig             -> origin/gh/rec/154/orig
2025-12-04T08:57:43.9211325Z  * [new branch]              gh/rec/164/base             -> origin/gh/rec/164/base
2025-12-04T08:57:43.9212408Z  * [new branch]              gh/rec/164/head             -> origin/gh/rec/164/head
2025-12-04T08:57:43.9213480Z  * [new branch]              gh/rec/164/orig             -> origin/gh/rec/164/orig
2025-12-04T08:57:43.9215050Z  * [new branch]              gh/rec/166/base             -> origin/gh/rec/166/base
2025-12-04T08:57:43.9216114Z  * [new branch]              gh/rec/166/head             -> origin/gh/rec/166/head
2025-12-04T08:57:43.9217643Z  * [new branch]              gh/rec/166/orig             -> origin/gh/rec/166/orig
2025-12-04T08:57:43.9219033Z  * [new branch]              gh/rec/167/base             -> origin/gh/rec/167/base
2025-12-04T08:57:43.9220139Z  * [new branch]              gh/rec/167/head             -> origin/gh/rec/167/head
2025-12-04T08:57:43.9221441Z  * [new branch]              gh/rec/167/orig             -> origin/gh/rec/167/orig
2025-12-04T08:57:43.9222978Z  * [new branch]              gh/rec/168/base             -> origin/gh/rec/168/base
2025-12-04T08:57:43.9224082Z  * [new branch]              gh/rec/168/head             -> origin/gh/rec/168/head
2025-12-04T08:57:43.9225210Z  * [new branch]              gh/rec/168/orig             -> origin/gh/rec/168/orig
2025-12-04T08:57:43.9226692Z  * [new branch]              gh/rec/169/base             -> origin/gh/rec/169/base
2025-12-04T08:57:43.9227789Z  * [new branch]              gh/rec/169/head             -> origin/gh/rec/169/head
2025-12-04T08:57:43.9228915Z  * [new branch]              gh/rec/169/orig             -> origin/gh/rec/169/orig
2025-12-04T08:57:43.9230611Z  * [new branch]              gh/rec/170/base             -> origin/gh/rec/170/base
2025-12-04T08:57:43.9231762Z  * [new branch]              gh/rec/170/head             -> origin/gh/rec/170/head
2025-12-04T08:57:43.9232990Z  * [new branch]              gh/rec/170/orig             -> origin/gh/rec/170/orig
2025-12-04T08:57:43.9234437Z  * [new branch]              gh/rec/171/base             -> origin/gh/rec/171/base
2025-12-04T08:57:43.9235502Z  * [new branch]              gh/rec/171/head             -> origin/gh/rec/171/head
2025-12-04T08:57:43.9236558Z  * [new branch]              gh/rec/171/orig             -> origin/gh/rec/171/orig
2025-12-04T08:57:43.9237982Z  * [new branch]              gh/rec/172/base             -> origin/gh/rec/172/base
2025-12-04T08:57:43.9239027Z  * [new branch]              gh/rec/172/head             -> origin/gh/rec/172/head
2025-12-04T08:57:43.9240167Z  * [new branch]              gh/rec/172/orig             -> origin/gh/rec/172/orig
2025-12-04T08:57:43.9241684Z  * [new branch]              gh/rec/173/base             -> origin/gh/rec/173/base
2025-12-04T08:57:43.9242764Z  * [new branch]              gh/rec/173/head             -> origin/gh/rec/173/head
2025-12-04T08:57:43.9243893Z  * [new branch]              gh/rec/173/orig             -> origin/gh/rec/173/orig
2025-12-04T08:57:43.9245392Z  * [new branch]              gh/rec/174/base             -> origin/gh/rec/174/base
2025-12-04T08:57:43.9246460Z  * [new branch]              gh/rec/174/head             -> origin/gh/rec/174/head
2025-12-04T08:57:43.9247601Z  * [new branch]              gh/rec/174/orig             -> origin/gh/rec/174/orig
2025-12-04T08:57:43.9249001Z  * [new branch]              gh/rec/175/base             -> origin/gh/rec/175/base
2025-12-04T08:57:43.9250048Z  * [new branch]              gh/rec/175/head             -> origin/gh/rec/175/head
2025-12-04T08:57:43.9251173Z  * [new branch]              gh/rec/175/orig             -> origin/gh/rec/175/orig
2025-12-04T08:57:43.9252574Z  * [new branch]              gh/rec/176/base             -> origin/gh/rec/176/base
2025-12-04T08:57:43.9253639Z  * [new branch]              gh/rec/176/head             -> origin/gh/rec/176/head
2025-12-04T08:57:43.9254841Z  * [new branch]              gh/rec/176/orig             -> origin/gh/rec/176/orig
2025-12-04T08:57:43.9256149Z  * [new branch]              gh/rec/177/base             -> origin/gh/rec/177/base
2025-12-04T08:57:43.9257591Z  * [new branch]              gh/rec/177/head             -> origin/gh/rec/177/head
2025-12-04T08:57:43.9258658Z  * [new branch]              gh/rec/177/orig             -> origin/gh/rec/177/orig
2025-12-04T08:57:43.9260636Z  * [new branch]              gh/robert-hardwick/3/base   -> origin/gh/robert-hardwick/3/base
2025-12-04T08:57:43.9262266Z  * [new branch]              gh/robert-hardwick/3/head   -> origin/gh/robert-hardwick/3/head
2025-12-04T08:57:43.9263427Z  * [new branch]              gh/robert-hardwick/3/orig   -> origin/gh/robert-hardwick/3/orig
2025-12-04T08:57:43.9264925Z  * [new branch]              gh/robert-hardwick/4/base   -> origin/gh/robert-hardwick/4/base
2025-12-04T08:57:43.9266071Z  * [new branch]              gh/robert-hardwick/4/head   -> origin/gh/robert-hardwick/4/head
2025-12-04T08:57:43.9267193Z  * [new branch]              gh/robert-hardwick/4/orig   -> origin/gh/robert-hardwick/4/orig
2025-12-04T08:57:43.9268831Z  * [new branch]              gh/robert-hardwick/5/base   -> origin/gh/robert-hardwick/5/base
2025-12-04T08:57:43.9269920Z  * [new branch]              gh/robert-hardwick/5/head   -> origin/gh/robert-hardwick/5/head
2025-12-04T08:57:43.9271115Z  * [new branch]              gh/robert-hardwick/5/orig   -> origin/gh/robert-hardwick/5/orig
2025-12-04T08:57:43.9272564Z  * [new branch]              gh/robert-hardwick/6/base   -> origin/gh/robert-hardwick/6/base
2025-12-04T08:57:43.9273682Z  * [new branch]              gh/robert-hardwick/6/head   -> origin/gh/robert-hardwick/6/head
2025-12-04T08:57:43.9274849Z  * [new branch]              gh/robert-hardwick/6/orig   -> origin/gh/robert-hardwick/6/orig
2025-12-04T08:57:43.9276272Z  * [new branch]              gh/robert-hardwick/7/base   -> origin/gh/robert-hardwick/7/base
2025-12-04T08:57:43.9277361Z  * [new branch]              gh/robert-hardwick/7/head   -> origin/gh/robert-hardwick/7/head
2025-12-04T08:57:43.9278489Z  * [new branch]              gh/robert-hardwick/7/orig   -> origin/gh/robert-hardwick/7/orig
2025-12-04T08:57:43.9279952Z  * [new branch]              gh/robert-hardwick/8/base   -> origin/gh/robert-hardwick/8/base
2025-12-04T08:57:43.9281022Z  * [new branch]              gh/robert-hardwick/8/head   -> origin/gh/robert-hardwick/8/head
2025-12-04T08:57:43.9282142Z  * [new branch]              gh/robert-hardwick/8/orig   -> origin/gh/robert-hardwick/8/orig
2025-12-04T08:57:43.9283575Z  * [new branch]              gh/robert-hardwick/9/base   -> origin/gh/robert-hardwick/9/base
2025-12-04T08:57:43.9284784Z  * [new branch]              gh/robert-hardwick/9/head   -> origin/gh/robert-hardwick/9/head
2025-12-04T08:57:43.9285924Z  * [new branch]              gh/robert-hardwick/9/orig   -> origin/gh/robert-hardwick/9/orig
2025-12-04T08:57:43.9287626Z  * [new branch]              gh/rtimpe/1/base            -> origin/gh/rtimpe/1/base
2025-12-04T08:57:43.9288809Z  * [new branch]              gh/rtimpe/1/head            -> origin/gh/rtimpe/1/head
2025-12-04T08:57:43.9290174Z  * [new branch]              gh/rtimpe/2/base            -> origin/gh/rtimpe/2/base
2025-12-04T08:57:43.9291206Z  * [new branch]              gh/rtimpe/2/head            -> origin/gh/rtimpe/2/head
2025-12-04T08:57:43.9292695Z  * [new branch]              gh/rtimpe/22/base           -> origin/gh/rtimpe/22/base
2025-12-04T08:57:43.9293780Z  * [new branch]              gh/rtimpe/22/head           -> origin/gh/rtimpe/22/head
2025-12-04T08:57:43.9295001Z  * [new branch]              gh/rtimpe/22/orig           -> origin/gh/rtimpe/22/orig
2025-12-04T08:57:43.9296424Z  * [new branch]              gh/rtimpe/23/base           -> origin/gh/rtimpe/23/base
2025-12-04T08:57:43.9297852Z  * [new branch]              gh/rtimpe/23/head           -> origin/gh/rtimpe/23/head
2025-12-04T08:57:43.9299008Z  * [new branch]              gh/rtimpe/23/orig           -> origin/gh/rtimpe/23/orig
2025-12-04T08:57:43.9300367Z  * [new branch]              gh/rtimpe/24/base           -> origin/gh/rtimpe/24/base
2025-12-04T08:57:43.9301475Z  * [new branch]              gh/rtimpe/24/head           -> origin/gh/rtimpe/24/head
2025-12-04T08:57:43.9302624Z  * [new branch]              gh/rtimpe/24/orig           -> origin/gh/rtimpe/24/orig
2025-12-04T08:57:43.9304144Z  * [new branch]              gh/rtimpe/25/base           -> origin/gh/rtimpe/25/base
2025-12-04T08:57:43.9305238Z  * [new branch]              gh/rtimpe/25/head           -> origin/gh/rtimpe/25/head
2025-12-04T08:57:43.9306387Z  * [new branch]              gh/rtimpe/25/orig           -> origin/gh/rtimpe/25/orig
2025-12-04T08:57:43.9307842Z  * [new branch]              gh/rtimpe/26/base           -> origin/gh/rtimpe/26/base
2025-12-04T08:57:43.9309042Z  * [new branch]              gh/rtimpe/26/head           -> origin/gh/rtimpe/26/head
2025-12-04T08:57:43.9310181Z  * [new branch]              gh/rtimpe/26/orig           -> origin/gh/rtimpe/26/orig
2025-12-04T08:57:43.9311644Z  * [new branch]              gh/rtimpe/27/base           -> origin/gh/rtimpe/27/base
2025-12-04T08:57:43.9312717Z  * [new branch]              gh/rtimpe/27/head           -> origin/gh/rtimpe/27/head
2025-12-04T08:57:43.9313802Z  * [new branch]              gh/rtimpe/27/orig           -> origin/gh/rtimpe/27/orig
2025-12-04T08:57:43.9315638Z  * [new branch]              gh/rtimpe/28/base           -> origin/gh/rtimpe/28/base
2025-12-04T08:57:43.9316701Z  * [new branch]              gh/rtimpe/28/head           -> origin/gh/rtimpe/28/head
2025-12-04T08:57:43.9317766Z  * [new branch]              gh/rtimpe/28/orig           -> origin/gh/rtimpe/28/orig
2025-12-04T08:57:43.9319343Z  * [new branch]              gh/rtimpe/29/base           -> origin/gh/rtimpe/29/base
2025-12-04T08:57:43.9320397Z  * [new branch]              gh/rtimpe/29/head           -> origin/gh/rtimpe/29/head
2025-12-04T08:57:43.9321936Z  * [new branch]              gh/rtimpe/29/orig           -> origin/gh/rtimpe/29/orig
2025-12-04T08:57:43.9323408Z  * [new branch]              gh/rtimpe/3/base            -> origin/gh/rtimpe/3/base
2025-12-04T08:57:43.9324470Z  * [new branch]              gh/rtimpe/3/head            -> origin/gh/rtimpe/3/head
2025-12-04T08:57:43.9325924Z  * [new branch]              gh/rtimpe/30/base           -> origin/gh/rtimpe/30/base
2025-12-04T08:57:43.9327016Z  * [new branch]              gh/rtimpe/30/head           -> origin/gh/rtimpe/30/head
2025-12-04T08:57:43.9328149Z  * [new branch]              gh/rtimpe/30/orig           -> origin/gh/rtimpe/30/orig
2025-12-04T08:57:43.9329596Z  * [new branch]              gh/rtimpe/31/base           -> origin/gh/rtimpe/31/base
2025-12-04T08:57:43.9330690Z  * [new branch]              gh/rtimpe/31/head           -> origin/gh/rtimpe/31/head
2025-12-04T08:57:43.9331897Z  * [new branch]              gh/rtimpe/31/orig           -> origin/gh/rtimpe/31/orig
2025-12-04T08:57:43.9333585Z  * [new branch]              gh/rtimpe/32/base           -> origin/gh/rtimpe/32/base
2025-12-04T08:57:43.9334656Z  * [new branch]              gh/rtimpe/32/head           -> origin/gh/rtimpe/32/head
2025-12-04T08:57:43.9335726Z  * [new branch]              gh/rtimpe/32/orig           -> origin/gh/rtimpe/32/orig
2025-12-04T08:57:43.9337563Z  * [new branch]              gh/rtimpe/33/base           -> origin/gh/rtimpe/33/base
2025-12-04T08:57:43.9338653Z  * [new branch]              gh/rtimpe/33/head           -> origin/gh/rtimpe/33/head
2025-12-04T08:57:43.9339774Z  * [new branch]              gh/rtimpe/33/orig           -> origin/gh/rtimpe/33/orig
2025-12-04T08:57:43.9341200Z  * [new branch]              gh/rtimpe/34/base           -> origin/gh/rtimpe/34/base
2025-12-04T08:57:43.9342310Z  * [new branch]              gh/rtimpe/34/head           -> origin/gh/rtimpe/34/head
2025-12-04T08:57:43.9343423Z  * [new branch]              gh/rtimpe/34/orig           -> origin/gh/rtimpe/34/orig
2025-12-04T08:57:43.9345071Z  * [new branch]              gh/rtimpe/35/base           -> origin/gh/rtimpe/35/base
2025-12-04T08:57:43.9346109Z  * [new branch]              gh/rtimpe/35/head           -> origin/gh/rtimpe/35/head
2025-12-04T08:57:43.9347293Z  * [new branch]              gh/rtimpe/35/orig           -> origin/gh/rtimpe/35/orig
2025-12-04T08:57:43.9348968Z  * [new branch]              gh/rtimpe/4/base            -> origin/gh/rtimpe/4/base
2025-12-04T08:57:43.9350046Z  * [new branch]              gh/rtimpe/4/head            -> origin/gh/rtimpe/4/head
2025-12-04T08:57:43.9351850Z  * [new branch]              gh/ruisizhang123/1/base     -> origin/gh/ruisizhang123/1/base
2025-12-04T08:57:43.9352925Z  * [new branch]              gh/ruisizhang123/1/head     -> origin/gh/ruisizhang123/1/head
2025-12-04T08:57:43.9354009Z  * [new branch]              gh/ruisizhang123/1/orig     -> origin/gh/ruisizhang123/1/orig
2025-12-04T08:57:43.9355471Z  * [new branch]              gh/ruisizhang123/4/base     -> origin/gh/ruisizhang123/4/base
2025-12-04T08:57:43.9356546Z  * [new branch]              gh/ruisizhang123/4/head     -> origin/gh/ruisizhang123/4/head
2025-12-04T08:57:43.9357657Z  * [new branch]              gh/ruisizhang123/4/orig     -> origin/gh/ruisizhang123/4/orig
2025-12-04T08:57:43.9359127Z  * [new branch]              gh/ruisizhang123/5/base     -> origin/gh/ruisizhang123/5/base
2025-12-04T08:57:43.9360738Z  * [new branch]              gh/ruisizhang123/5/head     -> origin/gh/ruisizhang123/5/head
2025-12-04T08:57:43.9361831Z  * [new branch]              gh/ruisizhang123/5/orig     -> origin/gh/ruisizhang123/5/orig
2025-12-04T08:57:43.9363480Z  * [new branch]              gh/ruisizhang123/6/base     -> origin/gh/ruisizhang123/6/base
2025-12-04T08:57:43.9364544Z  * [new branch]              gh/ruisizhang123/6/head     -> origin/gh/ruisizhang123/6/head
2025-12-04T08:57:43.9365609Z  * [new branch]              gh/ruisizhang123/6/orig     -> origin/gh/ruisizhang123/6/orig
2025-12-04T08:57:43.9367043Z  * [new branch]              gh/ruisizhang123/7/base     -> origin/gh/ruisizhang123/7/base
2025-12-04T08:57:43.9368205Z  * [new branch]              gh/ruisizhang123/7/head     -> origin/gh/ruisizhang123/7/head
2025-12-04T08:57:43.9369264Z  * [new branch]              gh/ruisizhang123/7/orig     -> origin/gh/ruisizhang123/7/orig
2025-12-04T08:57:43.9370649Z  * [new branch]              gh/ruisizhang123/8/base     -> origin/gh/ruisizhang123/8/base
2025-12-04T08:57:43.9371725Z  * [new branch]              gh/ruisizhang123/8/head     -> origin/gh/ruisizhang123/8/head
2025-12-04T08:57:43.9372823Z  * [new branch]              gh/ruisizhang123/8/orig     -> origin/gh/ruisizhang123/8/orig
2025-12-04T08:57:43.9374228Z  * [new branch]              gh/ruisizhang123/9/base     -> origin/gh/ruisizhang123/9/base
2025-12-04T08:57:43.9375316Z  * [new branch]              gh/ruisizhang123/9/head     -> origin/gh/ruisizhang123/9/head
2025-12-04T08:57:43.9376448Z  * [new branch]              gh/ruisizhang123/9/orig     -> origin/gh/ruisizhang123/9/orig
2025-12-04T08:57:43.9378672Z  * [new branch]              gh/seemethere/52/base       -> origin/gh/seemethere/52/base
2025-12-04T08:57:43.9379800Z  * [new branch]              gh/seemethere/52/head       -> origin/gh/seemethere/52/head
2025-12-04T08:57:43.9380969Z  * [new branch]              gh/seemethere/52/orig       -> origin/gh/seemethere/52/orig
2025-12-04T08:57:43.9382455Z  * [new branch]              gh/seemethere/53/base       -> origin/gh/seemethere/53/base
2025-12-04T08:57:43.9383578Z  * [new branch]              gh/seemethere/53/head       -> origin/gh/seemethere/53/head
2025-12-04T08:57:43.9384694Z  * [new branch]              gh/seemethere/53/orig       -> origin/gh/seemethere/53/orig
2025-12-04T08:57:43.9386175Z  * [new branch]              gh/seemethere/54/base       -> origin/gh/seemethere/54/base
2025-12-04T08:57:43.9387304Z  * [new branch]              gh/seemethere/54/head       -> origin/gh/seemethere/54/head
2025-12-04T08:57:43.9388480Z  * [new branch]              gh/seemethere/54/orig       -> origin/gh/seemethere/54/orig
2025-12-04T08:57:43.9390030Z  * [new branch]              gh/seemethere/55/base       -> origin/gh/seemethere/55/base
2025-12-04T08:57:43.9390927Z  * [new branch]              gh/seemethere/55/head       -> origin/gh/seemethere/55/head
2025-12-04T08:57:43.9392081Z  * [new branch]              gh/seemethere/55/orig       -> origin/gh/seemethere/55/orig
2025-12-04T08:57:43.9393515Z  * [new branch]              gh/seemethere/59/base       -> origin/gh/seemethere/59/base
2025-12-04T08:57:43.9394667Z  * [new branch]              gh/seemethere/59/head       -> origin/gh/seemethere/59/head
2025-12-04T08:57:43.9395840Z  * [new branch]              gh/seemethere/59/orig       -> origin/gh/seemethere/59/orig
2025-12-04T08:57:43.9397288Z  * [new branch]              gh/seemethere/62/base       -> origin/gh/seemethere/62/base
2025-12-04T08:57:43.9398380Z  * [new branch]              gh/seemethere/62/head       -> origin/gh/seemethere/62/head
2025-12-04T08:57:43.9399469Z  * [new branch]              gh/seemethere/62/orig       -> origin/gh/seemethere/62/orig
2025-12-04T08:57:43.9400862Z  * [new branch]              gh/seemethere/63/base       -> origin/gh/seemethere/63/base
2025-12-04T08:57:43.9401943Z  * [new branch]              gh/seemethere/63/head       -> origin/gh/seemethere/63/head
2025-12-04T08:57:43.9403072Z  * [new branch]              gh/seemethere/63/orig       -> origin/gh/seemethere/63/orig
2025-12-04T08:57:43.9404512Z  * [new branch]              gh/seemethere/71/base       -> origin/gh/seemethere/71/base
2025-12-04T08:57:43.9405635Z  * [new branch]              gh/seemethere/71/head       -> origin/gh/seemethere/71/head
2025-12-04T08:57:43.9406747Z  * [new branch]              gh/seemethere/71/orig       -> origin/gh/seemethere/71/orig
2025-12-04T08:57:43.9408361Z  * [new branch]              gh/seemethere/72/base       -> origin/gh/seemethere/72/base
2025-12-04T08:57:43.9409438Z  * [new branch]              gh/seemethere/72/head       -> origin/gh/seemethere/72/head
2025-12-04T08:57:43.9411034Z  * [new branch]              gh/seemethere/72/orig       -> origin/gh/seemethere/72/orig
2025-12-04T08:57:43.9412496Z  * [new branch]              gh/seemethere/73/base       -> origin/gh/seemethere/73/base
2025-12-04T08:57:43.9413612Z  * [new branch]              gh/seemethere/73/head       -> origin/gh/seemethere/73/head
2025-12-04T08:57:43.9414763Z  * [new branch]              gh/seemethere/73/orig       -> origin/gh/seemethere/73/orig
2025-12-04T08:57:43.9416207Z  * [new branch]              gh/seemethere/74/base       -> origin/gh/seemethere/74/base
2025-12-04T08:57:43.9417677Z  * [new branch]              gh/seemethere/74/head       -> origin/gh/seemethere/74/head
2025-12-04T08:57:43.9418785Z  * [new branch]              gh/seemethere/74/orig       -> origin/gh/seemethere/74/orig
2025-12-04T08:57:43.9420293Z  * [new branch]              gh/seemethere/75/base       -> origin/gh/seemethere/75/base
2025-12-04T08:57:43.9423325Z  * [new branch]              gh/seemethere/75/head       -> origin/gh/seemethere/75/head
2025-12-04T08:57:43.9424599Z  * [new branch]              gh/seemethere/75/orig       -> origin/gh/seemethere/75/orig
2025-12-04T08:57:43.9426287Z  * [new branch]              gh/seemethere/76/base       -> origin/gh/seemethere/76/base
2025-12-04T08:57:43.9427398Z  * [new branch]              gh/seemethere/76/head       -> origin/gh/seemethere/76/head
2025-12-04T08:57:43.9428578Z  * [new branch]              gh/seemethere/76/orig       -> origin/gh/seemethere/76/orig
2025-12-04T08:57:43.9430586Z  * [new branch]              gh/shunting314/145/base     -> origin/gh/shunting314/145/base
2025-12-04T08:57:43.9431821Z  * [new branch]              gh/shunting314/145/head     -> origin/gh/shunting314/145/head
2025-12-04T08:57:43.9433126Z  * [new branch]              gh/shunting314/145/orig     -> origin/gh/shunting314/145/orig
2025-12-04T08:57:43.9434818Z  * [new branch]              gh/shunting314/176/base     -> origin/gh/shunting314/176/base
2025-12-04T08:57:43.9436831Z  * [new branch]              gh/shunting314/176/head     -> origin/gh/shunting314/176/head
2025-12-04T08:57:43.9437875Z  * [new branch]              gh/shunting314/176/orig     -> origin/gh/shunting314/176/orig
2025-12-04T08:57:43.9439426Z  * [new branch]              gh/shunting314/249/base     -> origin/gh/shunting314/249/base
2025-12-04T08:57:43.9440635Z  * [new branch]              gh/shunting314/249/head     -> origin/gh/shunting314/249/head
2025-12-04T08:57:43.9441911Z  * [new branch]              gh/shunting314/249/orig     -> origin/gh/shunting314/249/orig
2025-12-04T08:57:43.9443448Z  * [new branch]              gh/shunting314/253/base     -> origin/gh/shunting314/253/base
2025-12-04T08:57:43.9444469Z  * [new branch]              gh/shunting314/253/head     -> origin/gh/shunting314/253/head
2025-12-04T08:57:43.9445552Z  * [new branch]              gh/shunting314/253/orig     -> origin/gh/shunting314/253/orig
2025-12-04T08:57:43.9447073Z  * [new branch]              gh/shunting314/256/base     -> origin/gh/shunting314/256/base
2025-12-04T08:57:43.9448184Z  * [new branch]              gh/shunting314/256/head     -> origin/gh/shunting314/256/head
2025-12-04T08:57:43.9449253Z  * [new branch]              gh/shunting314/256/orig     -> origin/gh/shunting314/256/orig
2025-12-04T08:57:43.9451074Z  * [new branch]              gh/shunting314/257/base     -> origin/gh/shunting314/257/base
2025-12-04T08:57:43.9452210Z  * [new branch]              gh/shunting314/257/head     -> origin/gh/shunting314/257/head
2025-12-04T08:57:43.9453267Z  * [new branch]              gh/shunting314/257/orig     -> origin/gh/shunting314/257/orig
2025-12-04T08:57:43.9454917Z  * [new branch]              gh/shunting314/258/base     -> origin/gh/shunting314/258/base
2025-12-04T08:57:43.9456000Z  * [new branch]              gh/shunting314/258/head     -> origin/gh/shunting314/258/head
2025-12-04T08:57:43.9457666Z  * [new branch]              gh/shunting314/258/orig     -> origin/gh/shunting314/258/orig
2025-12-04T08:57:43.9458931Z  * [new branch]              gh/shunting314/259/base     -> origin/gh/shunting314/259/base
2025-12-04T08:57:43.9460043Z  * [new branch]              gh/shunting314/259/head     -> origin/gh/shunting314/259/head
2025-12-04T08:57:43.9461166Z  * [new branch]              gh/shunting314/259/orig     -> origin/gh/shunting314/259/orig
2025-12-04T08:57:43.9462747Z  * [new branch]              gh/shunting314/260/base     -> origin/gh/shunting314/260/base
2025-12-04T08:57:43.9463916Z  * [new branch]              gh/shunting314/260/head     -> origin/gh/shunting314/260/head
2025-12-04T08:57:43.9465089Z  * [new branch]              gh/shunting314/260/orig     -> origin/gh/shunting314/260/orig
2025-12-04T08:57:43.9466675Z  * [new branch]              gh/shunting314/261/base     -> origin/gh/shunting314/261/base
2025-12-04T08:57:43.9467906Z  * [new branch]              gh/shunting314/261/head     -> origin/gh/shunting314/261/head
2025-12-04T08:57:43.9469124Z  * [new branch]              gh/shunting314/261/orig     -> origin/gh/shunting314/261/orig
2025-12-04T08:57:43.9470781Z  * [new branch]              gh/shunting314/262/base     -> origin/gh/shunting314/262/base
2025-12-04T08:57:43.9471929Z  * [new branch]              gh/shunting314/262/head     -> origin/gh/shunting314/262/head
2025-12-04T08:57:43.9473136Z  * [new branch]              gh/shunting314/262/orig     -> origin/gh/shunting314/262/orig
2025-12-04T08:57:43.9474695Z  * [new branch]              gh/shunting314/263/base     -> origin/gh/shunting314/263/base
2025-12-04T08:57:43.9475911Z  * [new branch]              gh/shunting314/263/head     -> origin/gh/shunting314/263/head
2025-12-04T08:57:43.9477058Z  * [new branch]              gh/shunting314/263/orig     -> origin/gh/shunting314/263/orig
2025-12-04T08:57:43.9478495Z  * [new branch]              gh/shunting314/264/base     -> origin/gh/shunting314/264/base
2025-12-04T08:57:43.9479585Z  * [new branch]              gh/shunting314/264/head     -> origin/gh/shunting314/264/head
2025-12-04T08:57:43.9480770Z  * [new branch]              gh/shunting314/264/orig     -> origin/gh/shunting314/264/orig
2025-12-04T08:57:43.9482266Z  * [new branch]              gh/shunting314/265/base     -> origin/gh/shunting314/265/base
2025-12-04T08:57:43.9483207Z  * [new branch]              gh/shunting314/265/head     -> origin/gh/shunting314/265/head
2025-12-04T08:57:43.9484377Z  * [new branch]              gh/shunting314/265/orig     -> origin/gh/shunting314/265/orig
2025-12-04T08:57:43.9485817Z  * [new branch]              gh/shunting314/266/base     -> origin/gh/shunting314/266/base
2025-12-04T08:57:43.9487075Z  * [new branch]              gh/shunting314/266/head     -> origin/gh/shunting314/266/head
2025-12-04T08:57:43.9488300Z  * [new branch]              gh/shunting314/266/orig     -> origin/gh/shunting314/266/orig
2025-12-04T08:57:43.9489924Z  * [new branch]              gh/shunting314/267/base     -> origin/gh/shunting314/267/base
2025-12-04T08:57:43.9491259Z  * [new branch]              gh/shunting314/267/head     -> origin/gh/shunting314/267/head
2025-12-04T08:57:43.9492346Z  * [new branch]              gh/shunting314/267/orig     -> origin/gh/shunting314/267/orig
2025-12-04T08:57:43.9494353Z  * [new branch]              gh/shunting314/268/base     -> origin/gh/shunting314/268/base
2025-12-04T08:57:43.9495596Z  * [new branch]              gh/shunting314/268/head     -> origin/gh/shunting314/268/head
2025-12-04T08:57:43.9496963Z  * [new branch]              gh/shunting314/268/orig     -> origin/gh/shunting314/268/orig
2025-12-04T08:57:43.9498677Z  * [new branch]              gh/shunting314/269/base     -> origin/gh/shunting314/269/base
2025-12-04T08:57:43.9500201Z  * [new branch]              gh/shunting314/269/head     -> origin/gh/shunting314/269/head
2025-12-04T08:57:43.9501363Z  * [new branch]              gh/shunting314/269/orig     -> origin/gh/shunting314/269/orig
2025-12-04T08:57:43.9503139Z  * [new branch]              gh/silverguo/1/base         -> origin/gh/silverguo/1/base
2025-12-04T08:57:43.9504357Z  * [new branch]              gh/silverguo/1/head         -> origin/gh/silverguo/1/head
2025-12-04T08:57:43.9505688Z  * [new branch]              gh/silverguo/2/base         -> origin/gh/silverguo/2/base
2025-12-04T08:57:43.9506794Z  * [new branch]              gh/silverguo/2/head         -> origin/gh/silverguo/2/head
2025-12-04T08:57:43.9508126Z  * [new branch]              gh/silverguo/3/base         -> origin/gh/silverguo/3/base
2025-12-04T08:57:43.9509386Z  * [new branch]              gh/silverguo/3/head         -> origin/gh/silverguo/3/head
2025-12-04T08:57:43.9510684Z  * [new branch]              gh/silverguo/4/base         -> origin/gh/silverguo/4/base
2025-12-04T08:57:43.9511743Z  * [new branch]              gh/silverguo/4/head         -> origin/gh/silverguo/4/head
2025-12-04T08:57:43.9513467Z  * [new branch]              gh/slayton58/39/base        -> origin/gh/slayton58/39/base
2025-12-04T08:57:43.9514572Z  * [new branch]              gh/slayton58/39/head        -> origin/gh/slayton58/39/head
2025-12-04T08:57:43.9515680Z  * [new branch]              gh/slayton58/39/orig        -> origin/gh/slayton58/39/orig
2025-12-04T08:57:43.9517231Z  * [new branch]              gh/slayton58/42/base        -> origin/gh/slayton58/42/base
2025-12-04T08:57:43.9518346Z  * [new branch]              gh/slayton58/42/head        -> origin/gh/slayton58/42/head
2025-12-04T08:57:43.9519570Z  * [new branch]              gh/slayton58/42/orig        -> origin/gh/slayton58/42/orig
2025-12-04T08:57:43.9521104Z  * [new branch]              gh/slayton58/43/base        -> origin/gh/slayton58/43/base
2025-12-04T08:57:43.9522569Z  * [new branch]              gh/slayton58/43/head        -> origin/gh/slayton58/43/head
2025-12-04T08:57:43.9523741Z  * [new branch]              gh/slayton58/43/orig        -> origin/gh/slayton58/43/orig
2025-12-04T08:57:43.9525772Z  * [new branch]              gh/slayton58/44/base        -> origin/gh/slayton58/44/base
2025-12-04T08:57:43.9526959Z  * [new branch]              gh/slayton58/44/head        -> origin/gh/slayton58/44/head
2025-12-04T08:57:43.9528253Z  * [new branch]              gh/slayton58/44/orig        -> origin/gh/slayton58/44/orig
2025-12-04T08:57:43.9529638Z  * [new branch]              gh/slayton58/45/base        -> origin/gh/slayton58/45/base
2025-12-04T08:57:43.9530685Z  * [new branch]              gh/slayton58/45/head        -> origin/gh/slayton58/45/head
2025-12-04T08:57:43.9531877Z  * [new branch]              gh/slayton58/45/orig        -> origin/gh/slayton58/45/orig
2025-12-04T08:57:43.9533598Z  * [new branch]              gh/slayton58/46/base        -> origin/gh/slayton58/46/base
2025-12-04T08:57:43.9534818Z  * [new branch]              gh/slayton58/46/head        -> origin/gh/slayton58/46/head
2025-12-04T08:57:43.9535964Z  * [new branch]              gh/slayton58/46/orig        -> origin/gh/slayton58/46/orig
2025-12-04T08:57:43.9537802Z  * [new branch]              gh/slayton58/6/base         -> origin/gh/slayton58/6/base
2025-12-04T08:57:43.9538971Z  * [new branch]              gh/slayton58/6/head         -> origin/gh/slayton58/6/head
2025-12-04T08:57:43.9540353Z  * [new branch]              gh/slayton58/7/base         -> origin/gh/slayton58/7/base
2025-12-04T08:57:43.9541370Z  * [new branch]              gh/slayton58/7/head         -> origin/gh/slayton58/7/head
2025-12-04T08:57:43.9543383Z  * [new branch]              gh/soulitzer/269/base       -> origin/gh/soulitzer/269/base
2025-12-04T08:57:43.9544547Z  * [new branch]              gh/soulitzer/269/head       -> origin/gh/soulitzer/269/head
2025-12-04T08:57:43.9545683Z  * [new branch]              gh/soulitzer/269/orig       -> origin/gh/soulitzer/269/orig
2025-12-04T08:57:43.9547288Z  * [new branch]              gh/soulitzer/276/base       -> origin/gh/soulitzer/276/base
2025-12-04T08:57:43.9548473Z  * [new branch]              gh/soulitzer/276/head       -> origin/gh/soulitzer/276/head
2025-12-04T08:57:43.9549672Z  * [new branch]              gh/soulitzer/276/orig       -> origin/gh/soulitzer/276/orig
2025-12-04T08:57:43.9551462Z  * [new branch]              gh/soulitzer/287/base       -> origin/gh/soulitzer/287/base
2025-12-04T08:57:43.9552604Z  * [new branch]              gh/soulitzer/287/head       -> origin/gh/soulitzer/287/head
2025-12-04T08:57:43.9553688Z  * [new branch]              gh/soulitzer/287/orig       -> origin/gh/soulitzer/287/orig
2025-12-04T08:57:43.9555299Z  * [new branch]              gh/soulitzer/296/base       -> origin/gh/soulitzer/296/base
2025-12-04T08:57:43.9556388Z  * [new branch]              gh/soulitzer/296/head       -> origin/gh/soulitzer/296/head
2025-12-04T08:57:43.9557475Z  * [new branch]              gh/soulitzer/296/orig       -> origin/gh/soulitzer/296/orig
2025-12-04T08:57:43.9559041Z  * [new branch]              gh/soulitzer/299/base       -> origin/gh/soulitzer/299/base
2025-12-04T08:57:43.9560224Z  * [new branch]              gh/soulitzer/299/head       -> origin/gh/soulitzer/299/head
2025-12-04T08:57:43.9561333Z  * [new branch]              gh/soulitzer/299/orig       -> origin/gh/soulitzer/299/orig
2025-12-04T08:57:43.9562949Z  * [new branch]              gh/soulitzer/300/base       -> origin/gh/soulitzer/300/base
2025-12-04T08:57:43.9564206Z  * [new branch]              gh/soulitzer/300/head       -> origin/gh/soulitzer/300/head
2025-12-04T08:57:43.9565285Z  * [new branch]              gh/soulitzer/300/orig       -> origin/gh/soulitzer/300/orig
2025-12-04T08:57:43.9566904Z  * [new branch]              gh/soulitzer/301/base       -> origin/gh/soulitzer/301/base
2025-12-04T08:57:43.9568020Z  * [new branch]              gh/soulitzer/301/head       -> origin/gh/soulitzer/301/head
2025-12-04T08:57:43.9569142Z  * [new branch]              gh/soulitzer/301/orig       -> origin/gh/soulitzer/301/orig
2025-12-04T08:57:43.9570621Z  * [new branch]              gh/soulitzer/313/base       -> origin/gh/soulitzer/313/base
2025-12-04T08:57:43.9571689Z  * [new branch]              gh/soulitzer/313/head       -> origin/gh/soulitzer/313/head
2025-12-04T08:57:43.9572802Z  * [new branch]              gh/soulitzer/313/orig       -> origin/gh/soulitzer/313/orig
2025-12-04T08:57:43.9574286Z  * [new branch]              gh/soulitzer/319/base       -> origin/gh/soulitzer/319/base
2025-12-04T08:57:43.9575310Z  * [new branch]              gh/soulitzer/319/head       -> origin/gh/soulitzer/319/head
2025-12-04T08:57:43.9576456Z  * [new branch]              gh/soulitzer/319/orig       -> origin/gh/soulitzer/319/orig
2025-12-04T08:57:43.9578346Z  * [new branch]              gh/soulitzer/320/base       -> origin/gh/soulitzer/320/base
2025-12-04T08:57:43.9580010Z  * [new branch]              gh/soulitzer/320/head       -> origin/gh/soulitzer/320/head
2025-12-04T08:57:43.9581635Z  * [new branch]              gh/soulitzer/320/orig       -> origin/gh/soulitzer/320/orig
2025-12-04T08:57:43.9583720Z  * [new branch]              gh/soulitzer/336/base       -> origin/gh/soulitzer/336/base
2025-12-04T08:57:43.9584786Z  * [new branch]              gh/soulitzer/336/head       -> origin/gh/soulitzer/336/head
2025-12-04T08:57:43.9585902Z  * [new branch]              gh/soulitzer/336/orig       -> origin/gh/soulitzer/336/orig
2025-12-04T08:57:43.9587526Z  * [new branch]              gh/soulitzer/347/base       -> origin/gh/soulitzer/347/base
2025-12-04T08:57:43.9588531Z  * [new branch]              gh/soulitzer/347/head       -> origin/gh/soulitzer/347/head
2025-12-04T08:57:43.9589689Z  * [new branch]              gh/soulitzer/347/orig       -> origin/gh/soulitzer/347/orig
2025-12-04T08:57:43.9591422Z  * [new branch]              gh/soulitzer/349/base       -> origin/gh/soulitzer/349/base
2025-12-04T08:57:43.9592552Z  * [new branch]              gh/soulitzer/349/head       -> origin/gh/soulitzer/349/head
2025-12-04T08:57:43.9593653Z  * [new branch]              gh/soulitzer/349/orig       -> origin/gh/soulitzer/349/orig
2025-12-04T08:57:43.9595066Z  * [new branch]              gh/soulitzer/350/base       -> origin/gh/soulitzer/350/base
2025-12-04T08:57:43.9596229Z  * [new branch]              gh/soulitzer/350/head       -> origin/gh/soulitzer/350/head
2025-12-04T08:57:43.9597310Z  * [new branch]              gh/soulitzer/350/orig       -> origin/gh/soulitzer/350/orig
2025-12-04T08:57:43.9598836Z  * [new branch]              gh/soulitzer/351/base       -> origin/gh/soulitzer/351/base
2025-12-04T08:57:43.9599905Z  * [new branch]              gh/soulitzer/351/head       -> origin/gh/soulitzer/351/head
2025-12-04T08:57:43.9600983Z  * [new branch]              gh/soulitzer/351/orig       -> origin/gh/soulitzer/351/orig
2025-12-04T08:57:43.9602401Z  * [new branch]              gh/soulitzer/353/base       -> origin/gh/soulitzer/353/base
2025-12-04T08:57:43.9603656Z  * [new branch]              gh/soulitzer/353/head       -> origin/gh/soulitzer/353/head
2025-12-04T08:57:43.9604733Z  * [new branch]              gh/soulitzer/353/orig       -> origin/gh/soulitzer/353/orig
2025-12-04T08:57:43.9607280Z  * [new branch]              gh/soulitzer/358/base       -> origin/gh/soulitzer/358/base
2025-12-04T08:57:43.9608568Z  * [new branch]              gh/soulitzer/358/head       -> origin/gh/soulitzer/358/head
2025-12-04T08:57:43.9609670Z  * [new branch]              gh/soulitzer/358/orig       -> origin/gh/soulitzer/358/orig
2025-12-04T08:57:43.9611707Z  * [new branch]              gh/soulitzer/359/base       -> origin/gh/soulitzer/359/base
2025-12-04T08:57:43.9612916Z  * [new branch]              gh/soulitzer/359/head       -> origin/gh/soulitzer/359/head
2025-12-04T08:57:43.9614095Z  * [new branch]              gh/soulitzer/359/orig       -> origin/gh/soulitzer/359/orig
2025-12-04T08:57:43.9615676Z  * [new branch]              gh/soulitzer/374/base       -> origin/gh/soulitzer/374/base
2025-12-04T08:57:43.9617132Z  * [new branch]              gh/soulitzer/374/head       -> origin/gh/soulitzer/374/head
2025-12-04T08:57:43.9618293Z  * [new branch]              gh/soulitzer/374/orig       -> origin/gh/soulitzer/374/orig
2025-12-04T08:57:43.9619816Z  * [new branch]              gh/soulitzer/375/base       -> origin/gh/soulitzer/375/base
2025-12-04T08:57:43.9621038Z  * [new branch]              gh/soulitzer/375/head       -> origin/gh/soulitzer/375/head
2025-12-04T08:57:43.9622901Z  * [new branch]              gh/soulitzer/375/orig       -> origin/gh/soulitzer/375/orig
2025-12-04T08:57:43.9624228Z  * [new branch]              gh/soulitzer/380/base       -> origin/gh/soulitzer/380/base
2025-12-04T08:57:43.9625340Z  * [new branch]              gh/soulitzer/380/head       -> origin/gh/soulitzer/380/head
2025-12-04T08:57:43.9626435Z  * [new branch]              gh/soulitzer/380/orig       -> origin/gh/soulitzer/380/orig
2025-12-04T08:57:43.9627959Z  * [new branch]              gh/soulitzer/385/base       -> origin/gh/soulitzer/385/base
2025-12-04T08:57:43.9629140Z  * [new branch]              gh/soulitzer/385/head       -> origin/gh/soulitzer/385/head
2025-12-04T08:57:43.9630263Z  * [new branch]              gh/soulitzer/385/orig       -> origin/gh/soulitzer/385/orig
2025-12-04T08:57:43.9631945Z  * [new branch]              gh/soulitzer/386/base       -> origin/gh/soulitzer/386/base
2025-12-04T08:57:43.9633188Z  * [new branch]              gh/soulitzer/386/head       -> origin/gh/soulitzer/386/head
2025-12-04T08:57:43.9634726Z  * [new branch]              gh/soulitzer/386/orig       -> origin/gh/soulitzer/386/orig
2025-12-04T08:57:43.9635782Z  * [new branch]              gh/soulitzer/387/base       -> origin/gh/soulitzer/387/base
2025-12-04T08:57:43.9636858Z  * [new branch]              gh/soulitzer/387/head       -> origin/gh/soulitzer/387/head
2025-12-04T08:57:43.9637931Z  * [new branch]              gh/soulitzer/387/orig       -> origin/gh/soulitzer/387/orig
2025-12-04T08:57:43.9639499Z  * [new branch]              gh/soulitzer/388/base       -> origin/gh/soulitzer/388/base
2025-12-04T08:57:43.9640505Z  * [new branch]              gh/soulitzer/388/head       -> origin/gh/soulitzer/388/head
2025-12-04T08:57:43.9641564Z  * [new branch]              gh/soulitzer/388/orig       -> origin/gh/soulitzer/388/orig
2025-12-04T08:57:43.9643074Z  * [new branch]              gh/soulitzer/389/base       -> origin/gh/soulitzer/389/base
2025-12-04T08:57:43.9644227Z  * [new branch]              gh/soulitzer/389/head       -> origin/gh/soulitzer/389/head
2025-12-04T08:57:43.9645293Z  * [new branch]              gh/soulitzer/389/orig       -> origin/gh/soulitzer/389/orig
2025-12-04T08:57:43.9647675Z  * [new branch]              gh/soulitzer/390/base       -> origin/gh/soulitzer/390/base
2025-12-04T08:57:43.9649220Z  * [new branch]              gh/soulitzer/390/head       -> origin/gh/soulitzer/390/head
2025-12-04T08:57:43.9650218Z  * [new branch]              gh/soulitzer/390/orig       -> origin/gh/soulitzer/390/orig
2025-12-04T08:57:43.9651779Z  * [new branch]              gh/soulitzer/391/base       -> origin/gh/soulitzer/391/base
2025-12-04T08:57:43.9652785Z  * [new branch]              gh/soulitzer/391/head       -> origin/gh/soulitzer/391/head
2025-12-04T08:57:43.9653868Z  * [new branch]              gh/soulitzer/391/orig       -> origin/gh/soulitzer/391/orig
2025-12-04T08:57:43.9655899Z  * [new branch]              gh/soulitzer/392/base       -> origin/gh/soulitzer/392/base
2025-12-04T08:57:43.9657829Z  * [new branch]              gh/soulitzer/392/head       -> origin/gh/soulitzer/392/head
2025-12-04T08:57:43.9658927Z  * [new branch]              gh/soulitzer/392/orig       -> origin/gh/soulitzer/392/orig
2025-12-04T08:57:43.9660926Z  * [new branch]              gh/swolchok/728/next        -> origin/gh/swolchok/728/next
2025-12-04T08:57:43.9662679Z  * [new branch]              gh/swolchok/819/base        -> origin/gh/swolchok/819/base
2025-12-04T08:57:43.9663907Z  * [new branch]              gh/swolchok/819/head        -> origin/gh/swolchok/819/head
2025-12-04T08:57:43.9665289Z  * [new branch]              gh/swolchok/819/orig        -> origin/gh/swolchok/819/orig
2025-12-04T08:57:43.9666793Z  * [new branch]              gh/swolchok/824/base        -> origin/gh/swolchok/824/base
2025-12-04T08:57:43.9667843Z  * [new branch]              gh/swolchok/824/head        -> origin/gh/swolchok/824/head
2025-12-04T08:57:43.9669043Z  * [new branch]              gh/swolchok/824/orig        -> origin/gh/swolchok/824/orig
2025-12-04T08:57:43.9670707Z  * [new branch]              gh/swolchok/829/base        -> origin/gh/swolchok/829/base
2025-12-04T08:57:43.9671564Z  * [new branch]              gh/swolchok/829/head        -> origin/gh/swolchok/829/head
2025-12-04T08:57:43.9672651Z  * [new branch]              gh/swolchok/829/orig        -> origin/gh/swolchok/829/orig
2025-12-04T08:57:43.9674268Z  * [new branch]              gh/swolchok/839/base        -> origin/gh/swolchok/839/base
2025-12-04T08:57:43.9675706Z  * [new branch]              gh/swolchok/839/head        -> origin/gh/swolchok/839/head
2025-12-04T08:57:43.9676824Z  * [new branch]              gh/swolchok/839/orig        -> origin/gh/swolchok/839/orig
2025-12-04T08:57:43.9678344Z  * [new branch]              gh/swolchok/841/base        -> origin/gh/swolchok/841/base
2025-12-04T08:57:43.9679392Z  * [new branch]              gh/swolchok/841/head        -> origin/gh/swolchok/841/head
2025-12-04T08:57:43.9681089Z  * [new branch]              gh/swolchok/841/orig        -> origin/gh/swolchok/841/orig
2025-12-04T08:57:43.9682590Z  * [new branch]              gh/swolchok/842/base        -> origin/gh/swolchok/842/base
2025-12-04T08:57:43.9683627Z  * [new branch]              gh/swolchok/842/head        -> origin/gh/swolchok/842/head
2025-12-04T08:57:43.9684714Z  * [new branch]              gh/swolchok/842/orig        -> origin/gh/swolchok/842/orig
2025-12-04T08:57:43.9686250Z  * [new branch]              gh/swolchok/845/base        -> origin/gh/swolchok/845/base
2025-12-04T08:57:43.9687268Z  * [new branch]              gh/swolchok/845/head        -> origin/gh/swolchok/845/head
2025-12-04T08:57:43.9688451Z  * [new branch]              gh/swolchok/845/orig        -> origin/gh/swolchok/845/orig
2025-12-04T08:57:43.9690022Z  * [new branch]              gh/swolchok/848/base        -> origin/gh/swolchok/848/base
2025-12-04T08:57:43.9691219Z  * [new branch]              gh/swolchok/848/head        -> origin/gh/swolchok/848/head
2025-12-04T08:57:43.9692534Z  * [new branch]              gh/swolchok/848/orig        -> origin/gh/swolchok/848/orig
2025-12-04T08:57:43.9694409Z  * [new branch]              gh/swolchok/856/base        -> origin/gh/swolchok/856/base
2025-12-04T08:57:43.9695420Z  * [new branch]              gh/swolchok/856/head        -> origin/gh/swolchok/856/head
2025-12-04T08:57:43.9696577Z  * [new branch]              gh/swolchok/856/orig        -> origin/gh/swolchok/856/orig
2025-12-04T08:57:43.9698478Z  * [new branch]              gh/swolchok/860/base        -> origin/gh/swolchok/860/base
2025-12-04T08:57:43.9699581Z  * [new branch]              gh/swolchok/860/head        -> origin/gh/swolchok/860/head
2025-12-04T08:57:43.9700852Z  * [new branch]              gh/swolchok/860/orig        -> origin/gh/swolchok/860/orig
2025-12-04T08:57:43.9702608Z  * [new branch]              gh/swolchok/861/base        -> origin/gh/swolchok/861/base
2025-12-04T08:57:43.9703757Z  * [new branch]              gh/swolchok/861/head        -> origin/gh/swolchok/861/head
2025-12-04T08:57:43.9704907Z  * [new branch]              gh/swolchok/861/orig        -> origin/gh/swolchok/861/orig
2025-12-04T08:57:43.9706526Z  * [new branch]              gh/swolchok/862/base        -> origin/gh/swolchok/862/base
2025-12-04T08:57:43.9707559Z  * [new branch]              gh/swolchok/862/head        -> origin/gh/swolchok/862/head
2025-12-04T08:57:43.9708902Z  * [new branch]              gh/swolchok/862/orig        -> origin/gh/swolchok/862/orig
2025-12-04T08:57:43.9710567Z  * [new branch]              gh/swolchok/863/base        -> origin/gh/swolchok/863/base
2025-12-04T08:57:43.9711597Z  * [new branch]              gh/swolchok/863/head        -> origin/gh/swolchok/863/head
2025-12-04T08:57:43.9712775Z  * [new branch]              gh/swolchok/863/orig        -> origin/gh/swolchok/863/orig
2025-12-04T08:57:43.9714375Z  * [new branch]              gh/swolchok/864/base        -> origin/gh/swolchok/864/base
2025-12-04T08:57:43.9715408Z  * [new branch]              gh/swolchok/864/head        -> origin/gh/swolchok/864/head
2025-12-04T08:57:43.9716634Z  * [new branch]              gh/swolchok/864/orig        -> origin/gh/swolchok/864/orig
2025-12-04T08:57:43.9718044Z  * [new branch]              gh/swolchok/865/base        -> origin/gh/swolchok/865/base
2025-12-04T08:57:43.9719385Z  * [new branch]              gh/swolchok/865/head        -> origin/gh/swolchok/865/head
2025-12-04T08:57:43.9720415Z  * [new branch]              gh/swolchok/865/orig        -> origin/gh/swolchok/865/orig
2025-12-04T08:57:43.9722893Z  * [new branch]              gh/swolchok/866/base        -> origin/gh/swolchok/866/base
2025-12-04T08:57:43.9724020Z  * [new branch]              gh/swolchok/866/head        -> origin/gh/swolchok/866/head
2025-12-04T08:57:43.9725388Z  * [new branch]              gh/swolchok/866/orig        -> origin/gh/swolchok/866/orig
2025-12-04T08:57:43.9726857Z  * [new branch]              gh/swolchok/867/base        -> origin/gh/swolchok/867/base
2025-12-04T08:57:43.9727983Z  * [new branch]              gh/swolchok/867/head        -> origin/gh/swolchok/867/head
2025-12-04T08:57:43.9729171Z  * [new branch]              gh/swolchok/867/orig        -> origin/gh/swolchok/867/orig
2025-12-04T08:57:43.9730758Z  * [new branch]              gh/swolchok/868/base        -> origin/gh/swolchok/868/base
2025-12-04T08:57:43.9731824Z  * [new branch]              gh/swolchok/868/head        -> origin/gh/swolchok/868/head
2025-12-04T08:57:43.9732953Z  * [new branch]              gh/swolchok/868/orig        -> origin/gh/swolchok/868/orig
2025-12-04T08:57:43.9734593Z  * [new branch]              gh/swolchok/869/base        -> origin/gh/swolchok/869/base
2025-12-04T08:57:43.9735738Z  * [new branch]              gh/swolchok/869/head        -> origin/gh/swolchok/869/head
2025-12-04T08:57:43.9737198Z  * [new branch]              gh/swolchok/869/orig        -> origin/gh/swolchok/869/orig
2025-12-04T08:57:43.9738900Z  * [new branch]              gh/swolchok/870/base        -> origin/gh/swolchok/870/base
2025-12-04T08:57:43.9739920Z  * [new branch]              gh/swolchok/870/head        -> origin/gh/swolchok/870/head
2025-12-04T08:57:43.9741125Z  * [new branch]              gh/swolchok/870/orig        -> origin/gh/swolchok/870/orig
2025-12-04T08:57:43.9742709Z  * [new branch]              gh/swolchok/871/base        -> origin/gh/swolchok/871/base
2025-12-04T08:57:43.9743904Z  * [new branch]              gh/swolchok/871/head        -> origin/gh/swolchok/871/head
2025-12-04T08:57:43.9745619Z  * [new branch]              gh/swolchok/871/orig        -> origin/gh/swolchok/871/orig
2025-12-04T08:57:43.9747559Z  * [new branch]              gh/teja-rao/4/base          -> origin/gh/teja-rao/4/base
2025-12-04T08:57:43.9748769Z  * [new branch]              gh/teja-rao/4/head          -> origin/gh/teja-rao/4/head
2025-12-04T08:57:43.9749893Z  * [new branch]              gh/teja-rao/4/orig          -> origin/gh/teja-rao/4/orig
2025-12-04T08:57:43.9751695Z  * [new branch]              gh/tianyu-l/2/base          -> origin/gh/tianyu-l/2/base
2025-12-04T08:57:43.9752740Z  * [new branch]              gh/tianyu-l/2/head          -> origin/gh/tianyu-l/2/head
2025-12-04T08:57:43.9753815Z  * [new branch]              gh/tianyu-l/2/orig          -> origin/gh/tianyu-l/2/orig
2025-12-04T08:57:43.9755404Z  * [new branch]              gh/tianyu-l/3/base          -> origin/gh/tianyu-l/3/base
2025-12-04T08:57:43.9756466Z  * [new branch]              gh/tianyu-l/3/orig          -> origin/gh/tianyu-l/3/orig
2025-12-04T08:57:43.9757923Z  * [new branch]              gh/tianyu-l/4/base          -> origin/gh/tianyu-l/4/base
2025-12-04T08:57:43.9758931Z  * [new branch]              gh/tianyu-l/4/head          -> origin/gh/tianyu-l/4/head
2025-12-04T08:57:43.9760027Z  * [new branch]              gh/tianyu-l/4/orig          -> origin/gh/tianyu-l/4/orig
2025-12-04T08:57:43.9762278Z  * [new branch]              gh/tugsbayasgalan/10/base   -> origin/gh/tugsbayasgalan/10/base
2025-12-04T08:57:43.9763348Z  * [new branch]              gh/tugsbayasgalan/10/head   -> origin/gh/tugsbayasgalan/10/head
2025-12-04T08:57:43.9764664Z  * [new branch]              gh/tugsbayasgalan/10/orig   -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T08:57:43.9765980Z  * [new branch]              gh/tugsbayasgalan/13/base   -> origin/gh/tugsbayasgalan/13/base
2025-12-04T08:57:43.9767189Z  * [new branch]              gh/tugsbayasgalan/13/head   -> origin/gh/tugsbayasgalan/13/head
2025-12-04T08:57:43.9768378Z  * [new branch]              gh/tugsbayasgalan/13/orig   -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T08:57:43.9770125Z  * [new branch]              gh/tugsbayasgalan/17/base   -> origin/gh/tugsbayasgalan/17/base
2025-12-04T08:57:43.9771092Z  * [new branch]              gh/tugsbayasgalan/17/head   -> origin/gh/tugsbayasgalan/17/head
2025-12-04T08:57:43.9772223Z  * [new branch]              gh/tugsbayasgalan/17/orig   -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T08:57:43.9773844Z  * [new branch]              gh/tugsbayasgalan/2/base    -> origin/gh/tugsbayasgalan/2/base
2025-12-04T08:57:43.9774845Z  * [new branch]              gh/tugsbayasgalan/2/head    -> origin/gh/tugsbayasgalan/2/head
2025-12-04T08:57:43.9775933Z  * [new branch]              gh/tugsbayasgalan/2/orig    -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T08:57:43.9778211Z  * [new branch]              gh/tugsbayasgalan/28/base   -> origin/gh/tugsbayasgalan/28/base
2025-12-04T08:57:43.9779309Z  * [new branch]              gh/tugsbayasgalan/28/head   -> origin/gh/tugsbayasgalan/28/head
2025-12-04T08:57:43.9780409Z  * [new branch]              gh/tugsbayasgalan/28/orig   -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T08:57:43.9782000Z  * [new branch]              gh/tugsbayasgalan/32/base   -> origin/gh/tugsbayasgalan/32/base
2025-12-04T08:57:43.9783070Z  * [new branch]              gh/tugsbayasgalan/32/head   -> origin/gh/tugsbayasgalan/32/head
2025-12-04T08:57:43.9784180Z  * [new branch]              gh/tugsbayasgalan/32/orig   -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T08:57:43.9785935Z  * [new branch]              gh/tugsbayasgalan/35/base   -> origin/gh/tugsbayasgalan/35/base
2025-12-04T08:57:43.9787100Z  * [new branch]              gh/tugsbayasgalan/35/head   -> origin/gh/tugsbayasgalan/35/head
2025-12-04T08:57:43.9788208Z  * [new branch]              gh/tugsbayasgalan/35/orig   -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T08:57:43.9789836Z  * [new branch]              gh/tugsbayasgalan/36/base   -> origin/gh/tugsbayasgalan/36/base
2025-12-04T08:57:43.9790861Z  * [new branch]              gh/tugsbayasgalan/36/head   -> origin/gh/tugsbayasgalan/36/head
2025-12-04T08:57:43.9791992Z  * [new branch]              gh/tugsbayasgalan/36/orig   -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T08:57:43.9793490Z  * [new branch]              gh/tugsbayasgalan/37/base   -> origin/gh/tugsbayasgalan/37/base
2025-12-04T08:57:43.9794498Z  * [new branch]              gh/tugsbayasgalan/37/head   -> origin/gh/tugsbayasgalan/37/head
2025-12-04T08:57:43.9795588Z  * [new branch]              gh/tugsbayasgalan/37/orig   -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T08:57:43.9797077Z  * [new branch]              gh/tugsbayasgalan/43/base   -> origin/gh/tugsbayasgalan/43/base
2025-12-04T08:57:43.9798729Z  * [new branch]              gh/tugsbayasgalan/43/head   -> origin/gh/tugsbayasgalan/43/head
2025-12-04T08:57:43.9799757Z  * [new branch]              gh/tugsbayasgalan/43/orig   -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T08:57:43.9801250Z  * [new branch]              gh/tugsbayasgalan/48/base   -> origin/gh/tugsbayasgalan/48/base
2025-12-04T08:57:43.9802287Z  * [new branch]              gh/tugsbayasgalan/48/head   -> origin/gh/tugsbayasgalan/48/head
2025-12-04T08:57:43.9803367Z  * [new branch]              gh/tugsbayasgalan/48/orig   -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T08:57:43.9804937Z  * [new branch]              gh/tugsbayasgalan/51/base   -> origin/gh/tugsbayasgalan/51/base
2025-12-04T08:57:43.9805974Z  * [new branch]              gh/tugsbayasgalan/51/head   -> origin/gh/tugsbayasgalan/51/head
2025-12-04T08:57:43.9807123Z  * [new branch]              gh/tugsbayasgalan/51/orig   -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T08:57:43.9808838Z  * [new branch]              gh/tugsbayasgalan/52/base   -> origin/gh/tugsbayasgalan/52/base
2025-12-04T08:57:43.9809881Z  * [new branch]              gh/tugsbayasgalan/52/head   -> origin/gh/tugsbayasgalan/52/head
2025-12-04T08:57:43.9811487Z  * [new branch]              gh/tugsbayasgalan/52/orig   -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T08:57:43.9812973Z  * [new branch]              gh/tugsbayasgalan/53/base   -> origin/gh/tugsbayasgalan/53/base
2025-12-04T08:57:43.9814012Z  * [new branch]              gh/tugsbayasgalan/53/head   -> origin/gh/tugsbayasgalan/53/head
2025-12-04T08:57:43.9815074Z  * [new branch]              gh/tugsbayasgalan/53/orig   -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T08:57:43.9817125Z  * [new branch]              gh/tugsbayasgalan/55/base   -> origin/gh/tugsbayasgalan/55/base
2025-12-04T08:57:43.9818421Z  * [new branch]              gh/tugsbayasgalan/55/head   -> origin/gh/tugsbayasgalan/55/head
2025-12-04T08:57:43.9819631Z  * [new branch]              gh/tugsbayasgalan/55/orig   -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T08:57:43.9823835Z  * [new branch]              gh/tugsbayasgalan/59/base   -> origin/gh/tugsbayasgalan/59/base
2025-12-04T08:57:43.9825185Z  * [new branch]              gh/tugsbayasgalan/59/head   -> origin/gh/tugsbayasgalan/59/head
2025-12-04T08:57:43.9826340Z  * [new branch]              gh/tugsbayasgalan/59/orig   -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T08:57:43.9827931Z  * [new branch]              gh/tugsbayasgalan/6/base    -> origin/gh/tugsbayasgalan/6/base
2025-12-04T08:57:43.9828968Z  * [new branch]              gh/tugsbayasgalan/6/head    -> origin/gh/tugsbayasgalan/6/head
2025-12-04T08:57:43.9830117Z  * [new branch]              gh/tugsbayasgalan/6/orig    -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T08:57:43.9832044Z  * [new branch]              gh/tugsbayasgalan/60/base   -> origin/gh/tugsbayasgalan/60/base
2025-12-04T08:57:43.9833257Z  * [new branch]              gh/tugsbayasgalan/60/head   -> origin/gh/tugsbayasgalan/60/head
2025-12-04T08:57:43.9834360Z  * [new branch]              gh/tugsbayasgalan/60/orig   -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T08:57:43.9836395Z  * [new branch]              gh/tugsbayasgalan/61/base   -> origin/gh/tugsbayasgalan/61/base
2025-12-04T08:57:43.9839043Z  * [new branch]              gh/tugsbayasgalan/61/head   -> origin/gh/tugsbayasgalan/61/head
2025-12-04T08:57:43.9839748Z  * [new branch]              gh/tugsbayasgalan/61/orig   -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T08:57:43.9840706Z  * [new branch]              gh/tugsbayasgalan/63/base   -> origin/gh/tugsbayasgalan/63/base
2025-12-04T08:57:43.9841772Z  * [new branch]              gh/tugsbayasgalan/63/head   -> origin/gh/tugsbayasgalan/63/head
2025-12-04T08:57:43.9842880Z  * [new branch]              gh/tugsbayasgalan/63/orig   -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T08:57:43.9844379Z  * [new branch]              gh/tugsbayasgalan/67/base   -> origin/gh/tugsbayasgalan/67/base
2025-12-04T08:57:43.9845469Z  * [new branch]              gh/tugsbayasgalan/67/head   -> origin/gh/tugsbayasgalan/67/head
2025-12-04T08:57:43.9846535Z  * [new branch]              gh/tugsbayasgalan/67/orig   -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T08:57:43.9848297Z  * [new branch]              gh/tugsbayasgalan/68/base   -> origin/gh/tugsbayasgalan/68/base
2025-12-04T08:57:43.9849348Z  * [new branch]              gh/tugsbayasgalan/68/head   -> origin/gh/tugsbayasgalan/68/head
2025-12-04T08:57:43.9850470Z  * [new branch]              gh/tugsbayasgalan/68/orig   -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T08:57:43.9852158Z  * [new branch]              gh/tugsbayasgalan/7/base    -> origin/gh/tugsbayasgalan/7/base
2025-12-04T08:57:43.9853173Z  * [new branch]              gh/tugsbayasgalan/7/head    -> origin/gh/tugsbayasgalan/7/head
2025-12-04T08:57:43.9854758Z  * [new branch]              gh/tugsbayasgalan/7/orig    -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T08:57:43.9857001Z  * [new branch]              gh/tugsbayasgalan/70/base   -> origin/gh/tugsbayasgalan/70/base
2025-12-04T08:57:43.9858253Z  * [new branch]              gh/tugsbayasgalan/70/head   -> origin/gh/tugsbayasgalan/70/head
2025-12-04T08:57:43.9859403Z  * [new branch]              gh/tugsbayasgalan/70/orig   -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T08:57:43.9861142Z  * [new branch]              gh/tugsbayasgalan/71/base   -> origin/gh/tugsbayasgalan/71/base
2025-12-04T08:57:43.9862416Z  * [new branch]              gh/tugsbayasgalan/71/head   -> origin/gh/tugsbayasgalan/71/head
2025-12-04T08:57:43.9863610Z  * [new branch]              gh/tugsbayasgalan/71/orig   -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T08:57:43.9865420Z  * [new branch]              gh/tugsbayasgalan/72/base   -> origin/gh/tugsbayasgalan/72/base
2025-12-04T08:57:43.9866534Z  * [new branch]              gh/tugsbayasgalan/72/head   -> origin/gh/tugsbayasgalan/72/head
2025-12-04T08:57:43.9867669Z  * [new branch]              gh/tugsbayasgalan/72/orig   -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T08:57:43.9869498Z  * [new branch]              gh/tugsbayasgalan/73/base   -> origin/gh/tugsbayasgalan/73/base
2025-12-04T08:57:43.9870620Z  * [new branch]              gh/tugsbayasgalan/73/head   -> origin/gh/tugsbayasgalan/73/head
2025-12-04T08:57:43.9871724Z  * [new branch]              gh/tugsbayasgalan/73/orig   -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T08:57:43.9873467Z  * [new branch]              gh/tugsbayasgalan/74/base   -> origin/gh/tugsbayasgalan/74/base
2025-12-04T08:57:43.9874615Z  * [new branch]              gh/tugsbayasgalan/74/head   -> origin/gh/tugsbayasgalan/74/head
2025-12-04T08:57:43.9875719Z  * [new branch]              gh/tugsbayasgalan/74/orig   -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T08:57:43.9877701Z  * [new branch]              gh/tugsbayasgalan/75/base   -> origin/gh/tugsbayasgalan/75/base
2025-12-04T08:57:43.9878750Z  * [new branch]              gh/tugsbayasgalan/75/head   -> origin/gh/tugsbayasgalan/75/head
2025-12-04T08:57:43.9879920Z  * [new branch]              gh/tugsbayasgalan/75/orig   -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T08:57:43.9881314Z  * [new branch]              gh/tugsbayasgalan/76/base   -> origin/gh/tugsbayasgalan/76/base
2025-12-04T08:57:43.9882377Z  * [new branch]              gh/tugsbayasgalan/76/head   -> origin/gh/tugsbayasgalan/76/head
2025-12-04T08:57:43.9883453Z  * [new branch]              gh/tugsbayasgalan/76/orig   -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T08:57:43.9885278Z  * [new branch]              gh/tugsbayasgalan/77/base   -> origin/gh/tugsbayasgalan/77/base
2025-12-04T08:57:43.9886259Z  * [new branch]              gh/tugsbayasgalan/77/head   -> origin/gh/tugsbayasgalan/77/head
2025-12-04T08:57:43.9887293Z  * [new branch]              gh/tugsbayasgalan/77/orig   -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T08:57:43.9888911Z  * [new branch]              gh/tugsbayasgalan/78/base   -> origin/gh/tugsbayasgalan/78/base
2025-12-04T08:57:43.9890134Z  * [new branch]              gh/tugsbayasgalan/78/head   -> origin/gh/tugsbayasgalan/78/head
2025-12-04T08:57:43.9891200Z  * [new branch]              gh/tugsbayasgalan/78/orig   -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T08:57:43.9892770Z  * [new branch]              gh/tugsbayasgalan/79/base   -> origin/gh/tugsbayasgalan/79/base
2025-12-04T08:57:43.9893804Z  * [new branch]              gh/tugsbayasgalan/79/head   -> origin/gh/tugsbayasgalan/79/head
2025-12-04T08:57:43.9895472Z  * [new branch]              gh/tugsbayasgalan/79/orig   -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T08:57:43.9897296Z  * [new branch]              gh/tugsbayasgalan/8/base    -> origin/gh/tugsbayasgalan/8/base
2025-12-04T08:57:43.9898323Z  * [new branch]              gh/tugsbayasgalan/8/head    -> origin/gh/tugsbayasgalan/8/head
2025-12-04T08:57:43.9899452Z  * [new branch]              gh/tugsbayasgalan/8/orig    -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T08:57:43.9901129Z  * [new branch]              gh/tugsbayasgalan/80/base   -> origin/gh/tugsbayasgalan/80/base
2025-12-04T08:57:43.9902024Z  * [new branch]              gh/tugsbayasgalan/80/head   -> origin/gh/tugsbayasgalan/80/head
2025-12-04T08:57:43.9903156Z  * [new branch]              gh/tugsbayasgalan/80/orig   -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T08:57:43.9904803Z  * [new branch]              gh/tugsbayasgalan/81/base   -> origin/gh/tugsbayasgalan/81/base
2025-12-04T08:57:43.9905784Z  * [new branch]              gh/tugsbayasgalan/81/head   -> origin/gh/tugsbayasgalan/81/head
2025-12-04T08:57:43.9907112Z  * [new branch]              gh/tugsbayasgalan/81/orig   -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T08:57:43.9909706Z  * [new branch]              gh/tugsbayasgalan/82/base   -> origin/gh/tugsbayasgalan/82/base
2025-12-04T08:57:43.9910947Z  * [new branch]              gh/tugsbayasgalan/82/head   -> origin/gh/tugsbayasgalan/82/head
2025-12-04T08:57:43.9912059Z  * [new branch]              gh/tugsbayasgalan/82/orig   -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T08:57:43.9913539Z  * [new branch]              gh/tugsbayasgalan/83/base   -> origin/gh/tugsbayasgalan/83/base
2025-12-04T08:57:43.9914628Z  * [new branch]              gh/tugsbayasgalan/83/head   -> origin/gh/tugsbayasgalan/83/head
2025-12-04T08:57:43.9915722Z  * [new branch]              gh/tugsbayasgalan/83/orig   -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T08:57:43.9917233Z  * [new branch]              gh/tugsbayasgalan/84/base   -> origin/gh/tugsbayasgalan/84/base
2025-12-04T08:57:43.9918241Z  * [new branch]              gh/tugsbayasgalan/84/head   -> origin/gh/tugsbayasgalan/84/head
2025-12-04T08:57:43.9919301Z  * [new branch]              gh/tugsbayasgalan/84/orig   -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T08:57:43.9920677Z  * [new branch]              gh/tugsbayasgalan/85/base   -> origin/gh/tugsbayasgalan/85/base
2025-12-04T08:57:43.9922359Z  * [new branch]              gh/tugsbayasgalan/85/head   -> origin/gh/tugsbayasgalan/85/head
2025-12-04T08:57:43.9923415Z  * [new branch]              gh/tugsbayasgalan/85/orig   -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T08:57:43.9925043Z  * [new branch]              gh/tugsbayasgalan/86/base   -> origin/gh/tugsbayasgalan/86/base
2025-12-04T08:57:43.9926182Z  * [new branch]              gh/tugsbayasgalan/86/head   -> origin/gh/tugsbayasgalan/86/head
2025-12-04T08:57:43.9927321Z  * [new branch]              gh/tugsbayasgalan/86/orig   -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T08:57:43.9929216Z  * [new branch]              gh/tugsbayasgalan/87/base   -> origin/gh/tugsbayasgalan/87/base
2025-12-04T08:57:43.9930349Z  * [new branch]              gh/tugsbayasgalan/87/head   -> origin/gh/tugsbayasgalan/87/head
2025-12-04T08:57:43.9931467Z  * [new branch]              gh/tugsbayasgalan/87/orig   -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T08:57:43.9933396Z  * [new branch]              gh/tugsbayasgalan/88/base   -> origin/gh/tugsbayasgalan/88/base
2025-12-04T08:57:43.9934413Z  * [new branch]              gh/tugsbayasgalan/88/head   -> origin/gh/tugsbayasgalan/88/head
2025-12-04T08:57:43.9935516Z  * [new branch]              gh/tugsbayasgalan/88/orig   -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T08:57:43.9937425Z  * [new branch]              gh/tugsbayasgalan/89/base   -> origin/gh/tugsbayasgalan/89/base
2025-12-04T08:57:43.9938505Z  * [new branch]              gh/tugsbayasgalan/89/head   -> origin/gh/tugsbayasgalan/89/head
2025-12-04T08:57:43.9940107Z  * [new branch]              gh/tugsbayasgalan/89/orig   -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T08:57:43.9942107Z  * [new branch]              gh/tugsbayasgalan/9/base    -> origin/gh/tugsbayasgalan/9/base
2025-12-04T08:57:43.9943077Z  * [new branch]              gh/tugsbayasgalan/9/head    -> origin/gh/tugsbayasgalan/9/head
2025-12-04T08:57:43.9944213Z  * [new branch]              gh/tugsbayasgalan/9/orig    -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T08:57:43.9945990Z  * [new branch]              gh/tugsbayasgalan/90/base   -> origin/gh/tugsbayasgalan/90/base
2025-12-04T08:57:43.9947147Z  * [new branch]              gh/tugsbayasgalan/90/head   -> origin/gh/tugsbayasgalan/90/head
2025-12-04T08:57:43.9948135Z  * [new branch]              gh/tugsbayasgalan/90/orig   -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T08:57:43.9949995Z  * [new branch]              gh/tugsbayasgalan/91/base   -> origin/gh/tugsbayasgalan/91/base
2025-12-04T08:57:43.9951055Z  * [new branch]              gh/tugsbayasgalan/91/head   -> origin/gh/tugsbayasgalan/91/head
2025-12-04T08:57:43.9952092Z  * [new branch]              gh/tugsbayasgalan/91/orig   -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T08:57:43.9953748Z  * [new branch]              gh/tugsbayasgalan/92/base   -> origin/gh/tugsbayasgalan/92/base
2025-12-04T08:57:43.9954765Z  * [new branch]              gh/tugsbayasgalan/92/head   -> origin/gh/tugsbayasgalan/92/head
2025-12-04T08:57:43.9955874Z  * [new branch]              gh/tugsbayasgalan/92/orig   -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T08:57:43.9958005Z  * [new branch]              gh/tugsbayasgalan/93/base   -> origin/gh/tugsbayasgalan/93/base
2025-12-04T08:57:43.9959141Z  * [new branch]              gh/tugsbayasgalan/93/head   -> origin/gh/tugsbayasgalan/93/head
2025-12-04T08:57:43.9960234Z  * [new branch]              gh/tugsbayasgalan/93/orig   -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T08:57:43.9962081Z  * [new branch]              gh/v0i0/14/base             -> origin/gh/v0i0/14/base
2025-12-04T08:57:43.9963083Z  * [new branch]              gh/v0i0/14/head             -> origin/gh/v0i0/14/head
2025-12-04T08:57:43.9964166Z  * [new branch]              gh/v0i0/14/orig             -> origin/gh/v0i0/14/orig
2025-12-04T08:57:43.9965646Z  * [new branch]              gh/v0i0/15/base             -> origin/gh/v0i0/15/base
2025-12-04T08:57:43.9966742Z  * [new branch]              gh/v0i0/15/head             -> origin/gh/v0i0/15/head
2025-12-04T08:57:43.9967823Z  * [new branch]              gh/v0i0/15/orig             -> origin/gh/v0i0/15/orig
2025-12-04T08:57:43.9969378Z  * [new branch]              gh/v0i0/16/base             -> origin/gh/v0i0/16/base
2025-12-04T08:57:43.9970411Z  * [new branch]              gh/v0i0/16/head             -> origin/gh/v0i0/16/head
2025-12-04T08:57:43.9971491Z  * [new branch]              gh/v0i0/16/orig             -> origin/gh/v0i0/16/orig
2025-12-04T08:57:43.9972994Z  * [new branch]              gh/v0i0/17/base             -> origin/gh/v0i0/17/base
2025-12-04T08:57:43.9974061Z  * [new branch]              gh/v0i0/17/head             -> origin/gh/v0i0/17/head
2025-12-04T08:57:43.9975310Z  * [new branch]              gh/v0i0/17/orig             -> origin/gh/v0i0/17/orig
2025-12-04T08:57:43.9977164Z  * [new branch]              gh/v0i0/18/base             -> origin/gh/v0i0/18/base
2025-12-04T08:57:43.9978481Z  * [new branch]              gh/v0i0/18/head             -> origin/gh/v0i0/18/head
2025-12-04T08:57:43.9979525Z  * [new branch]              gh/v0i0/18/orig             -> origin/gh/v0i0/18/orig
2025-12-04T08:57:43.9981172Z  * [new branch]              gh/v0i0/19/base             -> origin/gh/v0i0/19/base
2025-12-04T08:57:43.9982277Z  * [new branch]              gh/v0i0/19/head             -> origin/gh/v0i0/19/head
2025-12-04T08:57:43.9983410Z  * [new branch]              gh/v0i0/19/orig             -> origin/gh/v0i0/19/orig
2025-12-04T08:57:43.9985343Z  * [new branch]              gh/vishal9-team/1/base      -> origin/gh/vishal9-team/1/base
2025-12-04T08:57:43.9986442Z  * [new branch]              gh/vishal9-team/1/head      -> origin/gh/vishal9-team/1/head
2025-12-04T08:57:43.9988349Z  * [new branch]              gh/vishal9-team/2/base      -> origin/gh/vishal9-team/2/base
2025-12-04T08:57:43.9989504Z  * [new branch]              gh/vishal9-team/2/head      -> origin/gh/vishal9-team/2/head
2025-12-04T08:57:43.9990643Z  * [new branch]              gh/vishal9-team/2/orig      -> origin/gh/vishal9-team/2/orig
2025-12-04T08:57:43.9992136Z  * [new branch]              gh/vishal9-team/3/base      -> origin/gh/vishal9-team/3/base
2025-12-04T08:57:43.9993371Z  * [new branch]              gh/vishal9-team/3/head      -> origin/gh/vishal9-team/3/head
2025-12-04T08:57:43.9994398Z  * [new branch]              gh/vishal9-team/3/orig      -> origin/gh/vishal9-team/3/orig
2025-12-04T08:57:43.9995884Z  * [new branch]              gh/vishal9-team/4/base      -> origin/gh/vishal9-team/4/base
2025-12-04T08:57:43.9996888Z  * [new branch]              gh/vishal9-team/4/head      -> origin/gh/vishal9-team/4/head
2025-12-04T08:57:43.9997948Z  * [new branch]              gh/vishal9-team/4/orig      -> origin/gh/vishal9-team/4/orig
2025-12-04T08:57:43.9999722Z  * [new branch]              gh/vkuzo/1/next             -> origin/gh/vkuzo/1/next
2025-12-04T08:57:44.0001140Z  * [new branch]              gh/vkuzo/2/next             -> origin/gh/vkuzo/2/next
2025-12-04T08:57:44.0002576Z  * [new branch]              gh/vkuzo/3/next             -> origin/gh/vkuzo/3/next
2025-12-04T08:57:44.0004291Z  * [new branch]              gh/wconstab/424/base        -> origin/gh/wconstab/424/base
2025-12-04T08:57:44.0005416Z  * [new branch]              gh/wconstab/424/head        -> origin/gh/wconstab/424/head
2025-12-04T08:57:44.0006557Z  * [new branch]              gh/wconstab/424/orig        -> origin/gh/wconstab/424/orig
2025-12-04T08:57:44.0008170Z  * [new branch]              gh/wconstab/435/base        -> origin/gh/wconstab/435/base
2025-12-04T08:57:44.0009289Z  * [new branch]              gh/wconstab/435/head        -> origin/gh/wconstab/435/head
2025-12-04T08:57:44.0011029Z  * [new branch]              gh/wconstab/435/orig        -> origin/gh/wconstab/435/orig
2025-12-04T08:57:44.0012433Z  * [new branch]              gh/wconstab/444/base        -> origin/gh/wconstab/444/base
2025-12-04T08:57:44.0013511Z  * [new branch]              gh/wconstab/444/head        -> origin/gh/wconstab/444/head
2025-12-04T08:57:44.0014602Z  * [new branch]              gh/wconstab/444/orig        -> origin/gh/wconstab/444/orig
2025-12-04T08:57:44.0016135Z  * [new branch]              gh/wconstab/447/base        -> origin/gh/wconstab/447/base
2025-12-04T08:57:44.0017504Z  * [new branch]              gh/wconstab/447/head        -> origin/gh/wconstab/447/head
2025-12-04T08:57:44.0018625Z  * [new branch]              gh/wconstab/447/orig        -> origin/gh/wconstab/447/orig
2025-12-04T08:57:44.0020182Z  * [new branch]              gh/wconstab/448/base        -> origin/gh/wconstab/448/base
2025-12-04T08:57:44.0021524Z  * [new branch]              gh/wconstab/448/head        -> origin/gh/wconstab/448/head
2025-12-04T08:57:44.0022703Z  * [new branch]              gh/wconstab/448/orig        -> origin/gh/wconstab/448/orig
2025-12-04T08:57:44.0024230Z  * [new branch]              gh/wconstab/449/base        -> origin/gh/wconstab/449/base
2025-12-04T08:57:44.0025339Z  * [new branch]              gh/wconstab/449/head        -> origin/gh/wconstab/449/head
2025-12-04T08:57:44.0026546Z  * [new branch]              gh/wconstab/449/orig        -> origin/gh/wconstab/449/orig
2025-12-04T08:57:44.0027899Z  * [new branch]              gh/wconstab/450/base        -> origin/gh/wconstab/450/base
2025-12-04T08:57:44.0029038Z  * [new branch]              gh/wconstab/450/head        -> origin/gh/wconstab/450/head
2025-12-04T08:57:44.0030188Z  * [new branch]              gh/wconstab/450/orig        -> origin/gh/wconstab/450/orig
2025-12-04T08:57:44.0031599Z  * [new branch]              gh/wconstab/451/base        -> origin/gh/wconstab/451/base
2025-12-04T08:57:44.0032882Z  * [new branch]              gh/wconstab/451/head        -> origin/gh/wconstab/451/head
2025-12-04T08:57:44.0033996Z  * [new branch]              gh/wconstab/451/orig        -> origin/gh/wconstab/451/orig
2025-12-04T08:57:44.0035592Z  * [new branch]              gh/wconstab/452/base        -> origin/gh/wconstab/452/base
2025-12-04T08:57:44.0036582Z  * [new branch]              gh/wconstab/452/head        -> origin/gh/wconstab/452/head
2025-12-04T08:57:44.0037615Z  * [new branch]              gh/wconstab/452/orig        -> origin/gh/wconstab/452/orig
2025-12-04T08:57:44.0039205Z  * [new branch]              gh/wconstab/453/base        -> origin/gh/wconstab/453/base
2025-12-04T08:57:44.0040222Z  * [new branch]              gh/wconstab/453/head        -> origin/gh/wconstab/453/head
2025-12-04T08:57:44.0041405Z  * [new branch]              gh/wconstab/453/orig        -> origin/gh/wconstab/453/orig
2025-12-04T08:57:44.0042904Z  * [new branch]              gh/wconstab/454/base        -> origin/gh/wconstab/454/base
2025-12-04T08:57:44.0043916Z  * [new branch]              gh/wconstab/454/head        -> origin/gh/wconstab/454/head
2025-12-04T08:57:44.0045017Z  * [new branch]              gh/wconstab/454/orig        -> origin/gh/wconstab/454/orig
2025-12-04T08:57:44.0046545Z  * [new branch]              gh/wconstab/455/base        -> origin/gh/wconstab/455/base
2025-12-04T08:57:44.0047606Z  * [new branch]              gh/wconstab/455/head        -> origin/gh/wconstab/455/head
2025-12-04T08:57:44.0048721Z  * [new branch]              gh/wconstab/455/orig        -> origin/gh/wconstab/455/orig
2025-12-04T08:57:44.0050949Z  * [new branch]              gh/wconstab/456/base        -> origin/gh/wconstab/456/base
2025-12-04T08:57:44.0052337Z  * [new branch]              gh/wconstab/456/head        -> origin/gh/wconstab/456/head
2025-12-04T08:57:44.0053567Z  * [new branch]              gh/wconstab/456/orig        -> origin/gh/wconstab/456/orig
2025-12-04T08:57:44.0055172Z  * [new branch]              gh/wconstab/457/base        -> origin/gh/wconstab/457/base
2025-12-04T08:57:44.0056200Z  * [new branch]              gh/wconstab/457/head        -> origin/gh/wconstab/457/head
2025-12-04T08:57:44.0057826Z  * [new branch]              gh/wconstab/457/orig        -> origin/gh/wconstab/457/orig
2025-12-04T08:57:44.0059386Z  * [new branch]              gh/wconstab/458/base        -> origin/gh/wconstab/458/base
2025-12-04T08:57:44.0060521Z  * [new branch]              gh/wconstab/458/head        -> origin/gh/wconstab/458/head
2025-12-04T08:57:44.0061621Z  * [new branch]              gh/wconstab/458/orig        -> origin/gh/wconstab/458/orig
2025-12-04T08:57:44.0063080Z  * [new branch]              gh/wconstab/459/base        -> origin/gh/wconstab/459/base
2025-12-04T08:57:44.0064197Z  * [new branch]              gh/wconstab/459/head        -> origin/gh/wconstab/459/head
2025-12-04T08:57:44.0065746Z  * [new branch]              gh/wconstab/459/orig        -> origin/gh/wconstab/459/orig
2025-12-04T08:57:44.0067854Z  * [new branch]              gh/wconstab/460/base        -> origin/gh/wconstab/460/base
2025-12-04T08:57:44.0069387Z  * [new branch]              gh/wconstab/460/head        -> origin/gh/wconstab/460/head
2025-12-04T08:57:44.0070534Z  * [new branch]              gh/wconstab/460/orig        -> origin/gh/wconstab/460/orig
2025-12-04T08:57:44.0072365Z  * [new branch]              gh/wconstab/461/base        -> origin/gh/wconstab/461/base
2025-12-04T08:57:44.0073387Z  * [new branch]              gh/wconstab/461/head        -> origin/gh/wconstab/461/head
2025-12-04T08:57:44.0074471Z  * [new branch]              gh/wconstab/461/orig        -> origin/gh/wconstab/461/orig
2025-12-04T08:57:44.0075863Z  * [new branch]              gh/wconstab/462/base        -> origin/gh/wconstab/462/base
2025-12-04T08:57:44.0077025Z  * [new branch]              gh/wconstab/462/head        -> origin/gh/wconstab/462/head
2025-12-04T08:57:44.0078204Z  * [new branch]              gh/wconstab/462/orig        -> origin/gh/wconstab/462/orig
2025-12-04T08:57:44.0079800Z  * [new branch]              gh/wconstab/463/base        -> origin/gh/wconstab/463/base
2025-12-04T08:57:44.0080892Z  * [new branch]              gh/wconstab/463/head        -> origin/gh/wconstab/463/head
2025-12-04T08:57:44.0082036Z  * [new branch]              gh/wconstab/463/orig        -> origin/gh/wconstab/463/orig
2025-12-04T08:57:44.0083601Z  * [new branch]              gh/wconstab/464/base        -> origin/gh/wconstab/464/base
2025-12-04T08:57:44.0084647Z  * [new branch]              gh/wconstab/464/head        -> origin/gh/wconstab/464/head
2025-12-04T08:57:44.0085873Z  * [new branch]              gh/wconstab/464/orig        -> origin/gh/wconstab/464/orig
2025-12-04T08:57:44.0087345Z  * [new branch]              gh/wconstab/465/base        -> origin/gh/wconstab/465/base
2025-12-04T08:57:44.0088515Z  * [new branch]              gh/wconstab/465/head        -> origin/gh/wconstab/465/head
2025-12-04T08:57:44.0108375Z  * [new branch]              gh/wconstab/465/orig        -> origin/gh/wconstab/465/orig
2025-12-04T08:57:44.0109278Z  * [new branch]              gh/wconstab/466/base        -> origin/gh/wconstab/466/base
2025-12-04T08:57:44.0109921Z  * [new branch]              gh/wconstab/466/head        -> origin/gh/wconstab/466/head
2025-12-04T08:57:44.0110552Z  * [new branch]              gh/wconstab/466/orig        -> origin/gh/wconstab/466/orig
2025-12-04T08:57:44.0111183Z  * [new branch]              gh/wconstab/467/base        -> origin/gh/wconstab/467/base
2025-12-04T08:57:44.0111794Z  * [new branch]              gh/wconstab/467/head        -> origin/gh/wconstab/467/head
2025-12-04T08:57:44.0112427Z  * [new branch]              gh/wconstab/467/orig        -> origin/gh/wconstab/467/orig
2025-12-04T08:57:44.0113057Z  * [new branch]              gh/wconstab/468/base        -> origin/gh/wconstab/468/base
2025-12-04T08:57:44.0113667Z  * [new branch]              gh/wconstab/468/head        -> origin/gh/wconstab/468/head
2025-12-04T08:57:44.0114285Z  * [new branch]              gh/wconstab/468/orig        -> origin/gh/wconstab/468/orig
2025-12-04T08:57:44.0114917Z  * [new branch]              gh/weifengpy/39/base        -> origin/gh/weifengpy/39/base
2025-12-04T08:57:44.0115549Z  * [new branch]              gh/weifengpy/39/head        -> origin/gh/weifengpy/39/head
2025-12-04T08:57:44.0116168Z  * [new branch]              gh/weifengpy/39/orig        -> origin/gh/weifengpy/39/orig
2025-12-04T08:57:44.0116798Z  * [new branch]              gh/weifengpy/40/base        -> origin/gh/weifengpy/40/base
2025-12-04T08:57:44.0117425Z  * [new branch]              gh/weifengpy/40/head        -> origin/gh/weifengpy/40/head
2025-12-04T08:57:44.0118066Z  * [new branch]              gh/weifengpy/40/orig        -> origin/gh/weifengpy/40/orig
2025-12-04T08:57:44.0118688Z  * [new branch]              gh/weifengpy/41/base        -> origin/gh/weifengpy/41/base
2025-12-04T08:57:44.0119320Z  * [new branch]              gh/weifengpy/41/head        -> origin/gh/weifengpy/41/head
2025-12-04T08:57:44.0119947Z  * [new branch]              gh/weifengpy/41/orig        -> origin/gh/weifengpy/41/orig
2025-12-04T08:57:44.0120606Z  * [new branch]              gh/williamwen42/250/base    -> origin/gh/williamwen42/250/base
2025-12-04T08:57:44.0121641Z  * [new branch]              gh/williamwen42/250/head    -> origin/gh/williamwen42/250/head
2025-12-04T08:57:44.0122334Z  * [new branch]              gh/williamwen42/250/orig    -> origin/gh/williamwen42/250/orig
2025-12-04T08:57:44.0123027Z  * [new branch]              gh/williamwen42/279/base    -> origin/gh/williamwen42/279/base
2025-12-04T08:57:44.0123718Z  * [new branch]              gh/williamwen42/279/head    -> origin/gh/williamwen42/279/head
2025-12-04T08:57:44.0124397Z  * [new branch]              gh/williamwen42/279/orig    -> origin/gh/williamwen42/279/orig
2025-12-04T08:57:44.0125095Z  * [new branch]              gh/williamwen42/282/base    -> origin/gh/williamwen42/282/base
2025-12-04T08:57:44.0125782Z  * [new branch]              gh/williamwen42/282/head    -> origin/gh/williamwen42/282/head
2025-12-04T08:57:44.0126471Z  * [new branch]              gh/williamwen42/282/orig    -> origin/gh/williamwen42/282/orig
2025-12-04T08:57:44.0128080Z  * [new branch]              gh/williamwen42/287/base    -> origin/gh/williamwen42/287/base
2025-12-04T08:57:44.0129106Z  * [new branch]              gh/williamwen42/287/head    -> origin/gh/williamwen42/287/head
2025-12-04T08:57:44.0130265Z  * [new branch]              gh/williamwen42/287/orig    -> origin/gh/williamwen42/287/orig
2025-12-04T08:57:44.0131908Z  * [new branch]              gh/williamwen42/288/base    -> origin/gh/williamwen42/288/base
2025-12-04T08:57:44.0133255Z  * [new branch]              gh/williamwen42/288/head    -> origin/gh/williamwen42/288/head
2025-12-04T08:57:44.0134289Z  * [new branch]              gh/williamwen42/288/orig    -> origin/gh/williamwen42/288/orig
2025-12-04T08:57:44.0136015Z  * [new branch]              gh/williamwen42/296/base    -> origin/gh/williamwen42/296/base
2025-12-04T08:57:44.0137716Z  * [new branch]              gh/williamwen42/296/head    -> origin/gh/williamwen42/296/head
2025-12-04T08:57:44.0138748Z  * [new branch]              gh/williamwen42/296/orig    -> origin/gh/williamwen42/296/orig
2025-12-04T08:57:44.0140235Z  * [new branch]              gh/williamwen42/297/base    -> origin/gh/williamwen42/297/base
2025-12-04T08:57:44.0141297Z  * [new branch]              gh/williamwen42/297/head    -> origin/gh/williamwen42/297/head
2025-12-04T08:57:44.0142440Z  * [new branch]              gh/williamwen42/297/orig    -> origin/gh/williamwen42/297/orig
2025-12-04T08:57:44.0144017Z  * [new branch]              gh/williamwen42/306/base    -> origin/gh/williamwen42/306/base
2025-12-04T08:57:44.0145154Z  * [new branch]              gh/williamwen42/306/head    -> origin/gh/williamwen42/306/head
2025-12-04T08:57:44.0146332Z  * [new branch]              gh/williamwen42/306/orig    -> origin/gh/williamwen42/306/orig
2025-12-04T08:57:44.0147964Z  * [new branch]              gh/williamwen42/309/base    -> origin/gh/williamwen42/309/base
2025-12-04T08:57:44.0149312Z  * [new branch]              gh/williamwen42/309/head    -> origin/gh/williamwen42/309/head
2025-12-04T08:57:44.0150396Z  * [new branch]              gh/williamwen42/309/orig    -> origin/gh/williamwen42/309/orig
2025-12-04T08:57:44.0151940Z  * [new branch]              gh/williamwen42/310/base    -> origin/gh/williamwen42/310/base
2025-12-04T08:57:44.0152939Z  * [new branch]              gh/williamwen42/310/head    -> origin/gh/williamwen42/310/head
2025-12-04T08:57:44.0154042Z  * [new branch]              gh/williamwen42/310/orig    -> origin/gh/williamwen42/310/orig
2025-12-04T08:57:44.0156983Z  * [new branch]              gh/williamwen42/311/base    -> origin/gh/williamwen42/311/base
2025-12-04T08:57:44.0158024Z  * [new branch]              gh/williamwen42/311/head    -> origin/gh/williamwen42/311/head
2025-12-04T08:57:44.0159085Z  * [new branch]              gh/williamwen42/311/orig    -> origin/gh/williamwen42/311/orig
2025-12-04T08:57:44.0160477Z  * [new branch]              gh/williamwen42/319/base    -> origin/gh/williamwen42/319/base
2025-12-04T08:57:44.0161535Z  * [new branch]              gh/williamwen42/319/head    -> origin/gh/williamwen42/319/head
2025-12-04T08:57:44.0162616Z  * [new branch]              gh/williamwen42/319/orig    -> origin/gh/williamwen42/319/orig
2025-12-04T08:57:44.0164198Z  * [new branch]              gh/williamwen42/325/base    -> origin/gh/williamwen42/325/base
2025-12-04T08:57:44.0165823Z  * [new branch]              gh/williamwen42/325/head    -> origin/gh/williamwen42/325/head
2025-12-04T08:57:44.0166638Z  * [new branch]              gh/williamwen42/325/orig    -> origin/gh/williamwen42/325/orig
2025-12-04T08:57:44.0168113Z  * [new branch]              gh/williamwen42/326/base    -> origin/gh/williamwen42/326/base
2025-12-04T08:57:44.0169244Z  * [new branch]              gh/williamwen42/326/head    -> origin/gh/williamwen42/326/head
2025-12-04T08:57:44.0170391Z  * [new branch]              gh/williamwen42/326/orig    -> origin/gh/williamwen42/326/orig
2025-12-04T08:57:44.0172405Z  * [new branch]              gh/williamwen42/327/base    -> origin/gh/williamwen42/327/base
2025-12-04T08:57:44.0175505Z  * [new branch]              gh/williamwen42/327/head    -> origin/gh/williamwen42/327/head
2025-12-04T08:57:44.0176441Z  * [new branch]              gh/williamwen42/327/orig    -> origin/gh/williamwen42/327/orig
2025-12-04T08:57:44.0177300Z  * [new branch]              gh/williamwen42/328/base    -> origin/gh/williamwen42/328/base
2025-12-04T08:57:44.0178222Z  * [new branch]              gh/williamwen42/328/head    -> origin/gh/williamwen42/328/head
2025-12-04T08:57:44.0179237Z  * [new branch]              gh/williamwen42/328/orig    -> origin/gh/williamwen42/328/orig
2025-12-04T08:57:44.0181118Z  * [new branch]              gh/williamwen42/329/base    -> origin/gh/williamwen42/329/base
2025-12-04T08:57:44.0182418Z  * [new branch]              gh/williamwen42/329/head    -> origin/gh/williamwen42/329/head
2025-12-04T08:57:44.0183624Z  * [new branch]              gh/williamwen42/329/orig    -> origin/gh/williamwen42/329/orig
2025-12-04T08:57:44.0185219Z  * [new branch]              gh/williamwen42/330/base    -> origin/gh/williamwen42/330/base
2025-12-04T08:57:44.0186368Z  * [new branch]              gh/williamwen42/330/head    -> origin/gh/williamwen42/330/head
2025-12-04T08:57:44.0187533Z  * [new branch]              gh/williamwen42/330/orig    -> origin/gh/williamwen42/330/orig
2025-12-04T08:57:44.0189169Z  * [new branch]              gh/williamwen42/331/base    -> origin/gh/williamwen42/331/base
2025-12-04T08:57:44.0190208Z  * [new branch]              gh/williamwen42/331/head    -> origin/gh/williamwen42/331/head
2025-12-04T08:57:44.0191268Z  * [new branch]              gh/williamwen42/331/orig    -> origin/gh/williamwen42/331/orig
2025-12-04T08:57:44.0192676Z  * [new branch]              gh/williamwen42/332/base    -> origin/gh/williamwen42/332/base
2025-12-04T08:57:44.0193711Z  * [new branch]              gh/williamwen42/332/head    -> origin/gh/williamwen42/332/head
2025-12-04T08:57:44.0194816Z  * [new branch]              gh/williamwen42/332/orig    -> origin/gh/williamwen42/332/orig
2025-12-04T08:57:44.0196572Z  * [new branch]              gh/williamwen42/333/base    -> origin/gh/williamwen42/333/base
2025-12-04T08:57:44.0197672Z  * [new branch]              gh/williamwen42/333/head    -> origin/gh/williamwen42/333/head
2025-12-04T08:57:44.0198745Z  * [new branch]              gh/williamwen42/333/orig    -> origin/gh/williamwen42/333/orig
2025-12-04T08:57:44.0200754Z  * [new branch]              gh/williamwen42/334/base    -> origin/gh/williamwen42/334/base
2025-12-04T08:57:44.0201846Z  * [new branch]              gh/williamwen42/334/head    -> origin/gh/williamwen42/334/head
2025-12-04T08:57:44.0202983Z  * [new branch]              gh/williamwen42/334/orig    -> origin/gh/williamwen42/334/orig
2025-12-04T08:57:44.0204713Z  * [new branch]              gh/williamwen42/335/base    -> origin/gh/williamwen42/335/base
2025-12-04T08:57:44.0209520Z  * [new branch]              gh/williamwen42/335/head    -> origin/gh/williamwen42/335/head
2025-12-04T08:57:44.0210628Z  * [new branch]              gh/williamwen42/335/orig    -> origin/gh/williamwen42/335/orig
2025-12-04T08:57:44.0212287Z  * [new branch]              gh/williamwen42/336/base    -> origin/gh/williamwen42/336/base
2025-12-04T08:57:44.0213298Z  * [new branch]              gh/williamwen42/336/head    -> origin/gh/williamwen42/336/head
2025-12-04T08:57:44.0214324Z  * [new branch]              gh/williamwen42/336/orig    -> origin/gh/williamwen42/336/orig
2025-12-04T08:57:44.0215904Z  * [new branch]              gh/williamwen42/337/base    -> origin/gh/williamwen42/337/base
2025-12-04T08:57:44.0217422Z  * [new branch]              gh/williamwen42/337/head    -> origin/gh/williamwen42/337/head
2025-12-04T08:57:44.0219054Z  * [new branch]              gh/williamwen42/337/orig    -> origin/gh/williamwen42/337/orig
2025-12-04T08:57:44.0220694Z  * [new branch]              gh/williamwen42/338/base    -> origin/gh/williamwen42/338/base
2025-12-04T08:57:44.0222062Z  * [new branch]              gh/williamwen42/338/head    -> origin/gh/williamwen42/338/head
2025-12-04T08:57:44.0223174Z  * [new branch]              gh/williamwen42/338/orig    -> origin/gh/williamwen42/338/orig
2025-12-04T08:57:44.0224744Z  * [new branch]              gh/williamwen42/339/base    -> origin/gh/williamwen42/339/base
2025-12-04T08:57:44.0225827Z  * [new branch]              gh/williamwen42/339/head    -> origin/gh/williamwen42/339/head
2025-12-04T08:57:44.0226954Z  * [new branch]              gh/williamwen42/339/orig    -> origin/gh/williamwen42/339/orig
2025-12-04T08:57:44.0228850Z  * [new branch]              gh/williamwen42/340/base    -> origin/gh/williamwen42/340/base
2025-12-04T08:57:44.0229653Z  * [new branch]              gh/williamwen42/340/head    -> origin/gh/williamwen42/340/head
2025-12-04T08:57:44.0230733Z  * [new branch]              gh/williamwen42/340/orig    -> origin/gh/williamwen42/340/orig
2025-12-04T08:57:44.0232545Z  * [new branch]              gh/williamwen42/341/base    -> origin/gh/williamwen42/341/base
2025-12-04T08:57:44.0233789Z  * [new branch]              gh/williamwen42/341/head    -> origin/gh/williamwen42/341/head
2025-12-04T08:57:44.0234849Z  * [new branch]              gh/williamwen42/341/orig    -> origin/gh/williamwen42/341/orig
2025-12-04T08:57:44.0236361Z  * [new branch]              gh/williamwen42/342/base    -> origin/gh/williamwen42/342/base
2025-12-04T08:57:44.0237410Z  * [new branch]              gh/williamwen42/342/head    -> origin/gh/williamwen42/342/head
2025-12-04T08:57:44.0238535Z  * [new branch]              gh/williamwen42/342/orig    -> origin/gh/williamwen42/342/orig
2025-12-04T08:57:44.0240117Z  * [new branch]              gh/williamwen42/343/base    -> origin/gh/williamwen42/343/base
2025-12-04T08:57:44.0241204Z  * [new branch]              gh/williamwen42/343/head    -> origin/gh/williamwen42/343/head
2025-12-04T08:57:44.0242289Z  * [new branch]              gh/williamwen42/343/orig    -> origin/gh/williamwen42/343/orig
2025-12-04T08:57:44.0243888Z  * [new branch]              gh/williamwen42/344/base    -> origin/gh/williamwen42/344/base
2025-12-04T08:57:44.0244933Z  * [new branch]              gh/williamwen42/344/head    -> origin/gh/williamwen42/344/head
2025-12-04T08:57:44.0246011Z  * [new branch]              gh/williamwen42/344/orig    -> origin/gh/williamwen42/344/orig
2025-12-04T08:57:44.0247699Z  * [new branch]              gh/williamwen42/345/base    -> origin/gh/williamwen42/345/base
2025-12-04T08:57:44.0248900Z  * [new branch]              gh/williamwen42/345/head    -> origin/gh/williamwen42/345/head
2025-12-04T08:57:44.0250113Z  * [new branch]              gh/williamwen42/345/orig    -> origin/gh/williamwen42/345/orig
2025-12-04T08:57:44.0251688Z  * [new branch]              gh/williamwen42/346/base    -> origin/gh/williamwen42/346/base
2025-12-04T08:57:44.0253270Z  * [new branch]              gh/williamwen42/346/head    -> origin/gh/williamwen42/346/head
2025-12-04T08:57:44.0254347Z  * [new branch]              gh/williamwen42/346/orig    -> origin/gh/williamwen42/346/orig
2025-12-04T08:57:44.0256068Z  * [new branch]              gh/williamwen42/347/base    -> origin/gh/williamwen42/347/base
2025-12-04T08:57:44.0257398Z  * [new branch]              gh/williamwen42/347/head    -> origin/gh/williamwen42/347/head
2025-12-04T08:57:44.0258548Z  * [new branch]              gh/williamwen42/347/orig    -> origin/gh/williamwen42/347/orig
2025-12-04T08:57:44.0260102Z  * [new branch]              gh/williamwen42/348/base    -> origin/gh/williamwen42/348/base
2025-12-04T08:57:44.0261103Z  * [new branch]              gh/williamwen42/348/head    -> origin/gh/williamwen42/348/head
2025-12-04T08:57:44.0262193Z  * [new branch]              gh/williamwen42/348/orig    -> origin/gh/williamwen42/348/orig
2025-12-04T08:57:44.0263627Z  * [new branch]              gh/williamwen42/349/base    -> origin/gh/williamwen42/349/base
2025-12-04T08:57:44.0264771Z  * [new branch]              gh/williamwen42/349/head    -> origin/gh/williamwen42/349/head
2025-12-04T08:57:44.0265897Z  * [new branch]              gh/williamwen42/349/orig    -> origin/gh/williamwen42/349/orig
2025-12-04T08:57:44.0267958Z  * [new branch]              gh/williamwen42/350/base    -> origin/gh/williamwen42/350/base
2025-12-04T08:57:44.0269134Z  * [new branch]              gh/williamwen42/350/head    -> origin/gh/williamwen42/350/head
2025-12-04T08:57:44.0270252Z  * [new branch]              gh/williamwen42/350/orig    -> origin/gh/williamwen42/350/orig
2025-12-04T08:57:44.0271970Z  * [new branch]              gh/williamwen42/351/base    -> origin/gh/williamwen42/351/base
2025-12-04T08:57:44.0273042Z  * [new branch]              gh/williamwen42/351/head    -> origin/gh/williamwen42/351/head
2025-12-04T08:57:44.0274195Z  * [new branch]              gh/williamwen42/351/orig    -> origin/gh/williamwen42/351/orig
2025-12-04T08:57:44.0275757Z  * [new branch]              gh/williamwen42/352/base    -> origin/gh/williamwen42/352/base
2025-12-04T08:57:44.0276805Z  * [new branch]              gh/williamwen42/352/head    -> origin/gh/williamwen42/352/head
2025-12-04T08:57:44.0277886Z  * [new branch]              gh/williamwen42/352/orig    -> origin/gh/williamwen42/352/orig
2025-12-04T08:57:44.0279559Z  * [new branch]              gh/williamwen42/353/base    -> origin/gh/williamwen42/353/base
2025-12-04T08:57:44.0280765Z  * [new branch]              gh/williamwen42/353/head    -> origin/gh/williamwen42/353/head
2025-12-04T08:57:44.0281874Z  * [new branch]              gh/williamwen42/353/orig    -> origin/gh/williamwen42/353/orig
2025-12-04T08:57:44.0283355Z  * [new branch]              gh/williamwen42/354/base    -> origin/gh/williamwen42/354/base
2025-12-04T08:57:44.0284501Z  * [new branch]              gh/williamwen42/354/head    -> origin/gh/williamwen42/354/head
2025-12-04T08:57:44.0285608Z  * [new branch]              gh/williamwen42/354/orig    -> origin/gh/williamwen42/354/orig
2025-12-04T08:57:44.0287211Z  * [new branch]              gh/williamwen42/355/base    -> origin/gh/williamwen42/355/base
2025-12-04T08:57:44.0288242Z  * [new branch]              gh/williamwen42/355/head    -> origin/gh/williamwen42/355/head
2025-12-04T08:57:44.0289376Z  * [new branch]              gh/williamwen42/355/orig    -> origin/gh/williamwen42/355/orig
2025-12-04T08:57:44.0290949Z  * [new branch]              gh/williamwen42/356/base    -> origin/gh/williamwen42/356/base
2025-12-04T08:57:44.0291962Z  * [new branch]              gh/williamwen42/356/head    -> origin/gh/williamwen42/356/head
2025-12-04T08:57:44.0293047Z  * [new branch]              gh/williamwen42/356/orig    -> origin/gh/williamwen42/356/orig
2025-12-04T08:57:44.0294602Z  * [new branch]              gh/williamwen42/357/base    -> origin/gh/williamwen42/357/base
2025-12-04T08:57:44.0295744Z  * [new branch]              gh/williamwen42/357/head    -> origin/gh/williamwen42/357/head
2025-12-04T08:57:44.0297168Z  * [new branch]              gh/williamwen42/357/orig    -> origin/gh/williamwen42/357/orig
2025-12-04T08:57:44.0298730Z  * [new branch]              gh/williamwen42/358/base    -> origin/gh/williamwen42/358/base
2025-12-04T08:57:44.0299812Z  * [new branch]              gh/williamwen42/358/head    -> origin/gh/williamwen42/358/head
2025-12-04T08:57:44.0300996Z  * [new branch]              gh/williamwen42/358/orig    -> origin/gh/williamwen42/358/orig
2025-12-04T08:57:44.0302881Z  * [new branch]              gh/xmfan/169/base           -> origin/gh/xmfan/169/base
2025-12-04T08:57:44.0303979Z  * [new branch]              gh/xmfan/169/head           -> origin/gh/xmfan/169/head
2025-12-04T08:57:44.0305522Z  * [new branch]              gh/xmfan/170/base           -> origin/gh/xmfan/170/base
2025-12-04T08:57:44.0306424Z  * [new branch]              gh/xmfan/170/head           -> origin/gh/xmfan/170/head
2025-12-04T08:57:44.0307947Z  * [new branch]              gh/xmfan/274/base           -> origin/gh/xmfan/274/base
2025-12-04T08:57:44.0309130Z  * [new branch]              gh/xmfan/274/head           -> origin/gh/xmfan/274/head
2025-12-04T08:57:44.0310314Z  * [new branch]              gh/xmfan/274/orig           -> origin/gh/xmfan/274/orig
2025-12-04T08:57:44.0311794Z  * [new branch]              gh/xmfan/277/base           -> origin/gh/xmfan/277/base
2025-12-04T08:57:44.0312826Z  * [new branch]              gh/xmfan/277/head           -> origin/gh/xmfan/277/head
2025-12-04T08:57:44.0313906Z  * [new branch]              gh/xmfan/277/orig           -> origin/gh/xmfan/277/orig
2025-12-04T08:57:44.0315304Z  * [new branch]              gh/xmfan/301/base           -> origin/gh/xmfan/301/base
2025-12-04T08:57:44.0316568Z  * [new branch]              gh/xmfan/301/head           -> origin/gh/xmfan/301/head
2025-12-04T08:57:44.0317573Z  * [new branch]              gh/xmfan/301/orig           -> origin/gh/xmfan/301/orig
2025-12-04T08:57:44.0318994Z  * [new branch]              gh/xmfan/304/base           -> origin/gh/xmfan/304/base
2025-12-04T08:57:44.0320066Z  * [new branch]              gh/xmfan/304/head           -> origin/gh/xmfan/304/head
2025-12-04T08:57:44.0321491Z  * [new branch]              gh/xmfan/304/orig           -> origin/gh/xmfan/304/orig
2025-12-04T08:57:44.0325653Z  * [new branch]              gh/xmfan/309/base           -> origin/gh/xmfan/309/base
2025-12-04T08:57:44.0326781Z  * [new branch]              gh/xmfan/309/head           -> origin/gh/xmfan/309/head
2025-12-04T08:57:44.0328078Z  * [new branch]              gh/xmfan/309/orig           -> origin/gh/xmfan/309/orig
2025-12-04T08:57:44.0329574Z  * [new branch]              gh/xmfan/310/base           -> origin/gh/xmfan/310/base
2025-12-04T08:57:44.0330718Z  * [new branch]              gh/xmfan/310/head           -> origin/gh/xmfan/310/head
2025-12-04T08:57:44.0331820Z  * [new branch]              gh/xmfan/310/orig           -> origin/gh/xmfan/310/orig
2025-12-04T08:57:44.0333272Z  * [new branch]              gh/xmfan/311/base           -> origin/gh/xmfan/311/base
2025-12-04T08:57:44.0334467Z  * [new branch]              gh/xmfan/311/head           -> origin/gh/xmfan/311/head
2025-12-04T08:57:44.0335584Z  * [new branch]              gh/xmfan/311/orig           -> origin/gh/xmfan/311/orig
2025-12-04T08:57:44.0337814Z  * [new branch]              gh/xmfan/312/base           -> origin/gh/xmfan/312/base
2025-12-04T08:57:44.0338947Z  * [new branch]              gh/xmfan/312/head           -> origin/gh/xmfan/312/head
2025-12-04T08:57:44.0340046Z  * [new branch]              gh/xmfan/312/orig           -> origin/gh/xmfan/312/orig
2025-12-04T08:57:44.0341504Z  * [new branch]              gh/xmfan/313/base           -> origin/gh/xmfan/313/base
2025-12-04T08:57:44.0342594Z  * [new branch]              gh/xmfan/313/head           -> origin/gh/xmfan/313/head
2025-12-04T08:57:44.0343793Z  * [new branch]              gh/xmfan/313/orig           -> origin/gh/xmfan/313/orig
2025-12-04T08:57:44.0345610Z  * [new branch]              gh/xuanzhang816/27/base     -> origin/gh/xuanzhang816/27/base
2025-12-04T08:57:44.0346712Z  * [new branch]              gh/xuanzhang816/27/head     -> origin/gh/xuanzhang816/27/head
2025-12-04T08:57:44.0347847Z  * [new branch]              gh/xuanzhang816/27/orig     -> origin/gh/xuanzhang816/27/orig
2025-12-04T08:57:44.0349558Z  * [new branch]              gh/xuanzhang816/32/base     -> origin/gh/xuanzhang816/32/base
2025-12-04T08:57:44.0350638Z  * [new branch]              gh/xuanzhang816/32/head     -> origin/gh/xuanzhang816/32/head
2025-12-04T08:57:44.0351715Z  * [new branch]              gh/xuanzhang816/32/orig     -> origin/gh/xuanzhang816/32/orig
2025-12-04T08:57:44.0353184Z  * [new branch]              gh/xuanzhang816/33/base     -> origin/gh/xuanzhang816/33/base
2025-12-04T08:57:44.0354283Z  * [new branch]              gh/xuanzhang816/33/head     -> origin/gh/xuanzhang816/33/head
2025-12-04T08:57:44.0355361Z  * [new branch]              gh/xuanzhang816/33/orig     -> origin/gh/xuanzhang816/33/orig
2025-12-04T08:57:44.0357186Z  * [new branch]              gh/xuanzhang816/34/base     -> origin/gh/xuanzhang816/34/base
2025-12-04T08:57:44.0358439Z  * [new branch]              gh/xuanzhang816/34/head     -> origin/gh/xuanzhang816/34/head
2025-12-04T08:57:44.0359531Z  * [new branch]              gh/xuanzhang816/34/orig     -> origin/gh/xuanzhang816/34/orig
2025-12-04T08:57:44.0361229Z  * [new branch]              gh/xuanzhang816/35/base     -> origin/gh/xuanzhang816/35/base
2025-12-04T08:57:44.0362291Z  * [new branch]              gh/xuanzhang816/35/head     -> origin/gh/xuanzhang816/35/head
2025-12-04T08:57:44.0363383Z  * [new branch]              gh/xuanzhang816/35/orig     -> origin/gh/xuanzhang816/35/orig
2025-12-04T08:57:44.0365294Z  * [new branch]              gh/yanbing-j/11/base        -> origin/gh/yanbing-j/11/base
2025-12-04T08:57:44.0366339Z  * [new branch]              gh/yanbing-j/11/head        -> origin/gh/yanbing-j/11/head
2025-12-04T08:57:44.0367403Z  * [new branch]              gh/yanbing-j/11/orig        -> origin/gh/yanbing-j/11/orig
2025-12-04T08:57:44.0368870Z  * [new branch]              gh/yanbing-j/12/base        -> origin/gh/yanbing-j/12/base
2025-12-04T08:57:44.0369972Z  * [new branch]              gh/yanbing-j/12/head        -> origin/gh/yanbing-j/12/head
2025-12-04T08:57:44.0371023Z  * [new branch]              gh/yanbing-j/12/orig        -> origin/gh/yanbing-j/12/orig
2025-12-04T08:57:44.0372618Z  * [new branch]              gh/yanbing-j/13/base        -> origin/gh/yanbing-j/13/base
2025-12-04T08:57:44.0373718Z  * [new branch]              gh/yanbing-j/13/head        -> origin/gh/yanbing-j/13/head
2025-12-04T08:57:44.0374790Z  * [new branch]              gh/yanbing-j/13/orig        -> origin/gh/yanbing-j/13/orig
2025-12-04T08:57:44.0376205Z  * [new branch]              gh/yanbing-j/14/base        -> origin/gh/yanbing-j/14/base
2025-12-04T08:57:44.0377707Z  * [new branch]              gh/yanbing-j/14/head        -> origin/gh/yanbing-j/14/head
2025-12-04T08:57:44.0378892Z  * [new branch]              gh/yanbing-j/14/orig        -> origin/gh/yanbing-j/14/orig
2025-12-04T08:57:44.0380240Z  * [new branch]              gh/yanbing-j/15/base        -> origin/gh/yanbing-j/15/base
2025-12-04T08:57:44.0381373Z  * [new branch]              gh/yanbing-j/15/head        -> origin/gh/yanbing-j/15/head
2025-12-04T08:57:44.0382499Z  * [new branch]              gh/yanbing-j/15/orig        -> origin/gh/yanbing-j/15/orig
2025-12-04T08:57:44.0383948Z  * [new branch]              gh/yanbing-j/18/base        -> origin/gh/yanbing-j/18/base
2025-12-04T08:57:44.0385176Z  * [new branch]              gh/yanbing-j/18/head        -> origin/gh/yanbing-j/18/head
2025-12-04T08:57:44.0386304Z  * [new branch]              gh/yanbing-j/18/orig        -> origin/gh/yanbing-j/18/orig
2025-12-04T08:57:44.0387863Z  * [new branch]              gh/yanbing-j/19/base        -> origin/gh/yanbing-j/19/base
2025-12-04T08:57:44.0388959Z  * [new branch]              gh/yanbing-j/19/head        -> origin/gh/yanbing-j/19/head
2025-12-04T08:57:44.0390190Z  * [new branch]              gh/yanbing-j/19/orig        -> origin/gh/yanbing-j/19/orig
2025-12-04T08:57:44.0392081Z  * [new branch]              gh/yanbing-j/20/base        -> origin/gh/yanbing-j/20/base
2025-12-04T08:57:44.0393188Z  * [new branch]              gh/yanbing-j/20/head        -> origin/gh/yanbing-j/20/head
2025-12-04T08:57:44.0394313Z  * [new branch]              gh/yanbing-j/20/orig        -> origin/gh/yanbing-j/20/orig
2025-12-04T08:57:44.0395788Z  * [new branch]              gh/yanbing-j/21/base        -> origin/gh/yanbing-j/21/base
2025-12-04T08:57:44.0396903Z  * [new branch]              gh/yanbing-j/21/head        -> origin/gh/yanbing-j/21/head
2025-12-04T08:57:44.0398378Z  * [new branch]              gh/yanbing-j/22/base        -> origin/gh/yanbing-j/22/base
2025-12-04T08:57:44.0399450Z  * [new branch]              gh/yanbing-j/22/head        -> origin/gh/yanbing-j/22/head
2025-12-04T08:57:44.0400490Z  * [new branch]              gh/yanbing-j/22/orig        -> origin/gh/yanbing-j/22/orig
2025-12-04T08:57:44.0402054Z  * [new branch]              gh/yanbing-j/23/base        -> origin/gh/yanbing-j/23/base
2025-12-04T08:57:44.0403206Z  * [new branch]              gh/yanbing-j/23/head        -> origin/gh/yanbing-j/23/head
2025-12-04T08:57:44.0404300Z  * [new branch]              gh/yanbing-j/23/orig        -> origin/gh/yanbing-j/23/orig
2025-12-04T08:57:44.0405744Z  * [new branch]              gh/yanbing-j/24/base        -> origin/gh/yanbing-j/24/base
2025-12-04T08:57:44.0406828Z  * [new branch]              gh/yanbing-j/24/head        -> origin/gh/yanbing-j/24/head
2025-12-04T08:57:44.0407930Z  * [new branch]              gh/yanbing-j/24/orig        -> origin/gh/yanbing-j/24/orig
2025-12-04T08:57:44.0409430Z  * [new branch]              gh/yanbing-j/25/base        -> origin/gh/yanbing-j/25/base
2025-12-04T08:57:44.0410476Z  * [new branch]              gh/yanbing-j/25/head        -> origin/gh/yanbing-j/25/head
2025-12-04T08:57:44.0411535Z  * [new branch]              gh/yanbing-j/25/orig        -> origin/gh/yanbing-j/25/orig
2025-12-04T08:57:44.0412976Z  * [new branch]              gh/yanbing-j/26/base        -> origin/gh/yanbing-j/26/base
2025-12-04T08:57:44.0414049Z  * [new branch]              gh/yanbing-j/26/head        -> origin/gh/yanbing-j/26/head
2025-12-04T08:57:44.0415129Z  * [new branch]              gh/yanbing-j/26/orig        -> origin/gh/yanbing-j/26/orig
2025-12-04T08:57:44.0417557Z  * [new branch]              gh/yang-yu-hang/1/base      -> origin/gh/yang-yu-hang/1/base
2025-12-04T08:57:44.0418746Z  * [new branch]              gh/yang-yu-hang/1/head      -> origin/gh/yang-yu-hang/1/head
2025-12-04T08:57:44.0420062Z  * [new branch]              gh/yang-yu-hang/1/orig      -> origin/gh/yang-yu-hang/1/orig
2025-12-04T08:57:44.0421861Z  * [new branch]              gh/yang-yu-hang/2/base      -> origin/gh/yang-yu-hang/2/base
2025-12-04T08:57:44.0423266Z  * [new branch]              gh/yang-yu-hang/2/head      -> origin/gh/yang-yu-hang/2/head
2025-12-04T08:57:44.0424669Z  * [new branch]              gh/yang-yu-hang/2/orig      -> origin/gh/yang-yu-hang/2/orig
2025-12-04T08:57:44.0426166Z  * [new branch]              gh/yang-yu-hang/3/base      -> origin/gh/yang-yu-hang/3/base
2025-12-04T08:57:44.0427337Z  * [new branch]              gh/yang-yu-hang/3/head      -> origin/gh/yang-yu-hang/3/head
2025-12-04T08:57:44.0428487Z  * [new branch]              gh/yang-yu-hang/3/orig      -> origin/gh/yang-yu-hang/3/orig
2025-12-04T08:57:44.0430296Z  * [new branch]              gh/yangw-dev/12/base        -> origin/gh/yangw-dev/12/base
2025-12-04T08:57:44.0431799Z  * [new branch]              gh/yangw-dev/12/head        -> origin/gh/yangw-dev/12/head
2025-12-04T08:57:44.0433005Z  * [new branch]              gh/yangw-dev/12/orig        -> origin/gh/yangw-dev/12/orig
2025-12-04T08:57:44.0434973Z  * [new branch]              gh/yangw-dev/13/base        -> origin/gh/yangw-dev/13/base
2025-12-04T08:57:44.0436161Z  * [new branch]              gh/yangw-dev/13/head        -> origin/gh/yangw-dev/13/head
2025-12-04T08:57:44.0437232Z  * [new branch]              gh/yangw-dev/13/orig        -> origin/gh/yangw-dev/13/orig
2025-12-04T08:57:44.0438655Z  * [new branch]              gh/yangw-dev/14/base        -> origin/gh/yangw-dev/14/base
2025-12-04T08:57:44.0439799Z  * [new branch]              gh/yangw-dev/14/head        -> origin/gh/yangw-dev/14/head
2025-12-04T08:57:44.0440899Z  * [new branch]              gh/yangw-dev/14/orig        -> origin/gh/yangw-dev/14/orig
2025-12-04T08:57:44.0442344Z  * [new branch]              gh/yangw-dev/15/base        -> origin/gh/yangw-dev/15/base
2025-12-04T08:57:44.0443458Z  * [new branch]              gh/yangw-dev/15/head        -> origin/gh/yangw-dev/15/head
2025-12-04T08:57:44.0444525Z  * [new branch]              gh/yangw-dev/15/orig        -> origin/gh/yangw-dev/15/orig
2025-12-04T08:57:44.0445966Z  * [new branch]              gh/yangw-dev/19/base        -> origin/gh/yangw-dev/19/base
2025-12-04T08:57:44.0447011Z  * [new branch]              gh/yangw-dev/19/head        -> origin/gh/yangw-dev/19/head
2025-12-04T08:57:44.0448088Z  * [new branch]              gh/yangw-dev/19/orig        -> origin/gh/yangw-dev/19/orig
2025-12-04T08:57:44.0449573Z  * [new branch]              gh/yangw-dev/26/base        -> origin/gh/yangw-dev/26/base
2025-12-04T08:57:44.0450829Z  * [new branch]              gh/yangw-dev/26/head        -> origin/gh/yangw-dev/26/head
2025-12-04T08:57:44.0451877Z  * [new branch]              gh/yangw-dev/26/orig        -> origin/gh/yangw-dev/26/orig
2025-12-04T08:57:44.0453289Z  * [new branch]              gh/yangw-dev/27/base        -> origin/gh/yangw-dev/27/base
2025-12-04T08:57:44.0454388Z  * [new branch]              gh/yangw-dev/27/head        -> origin/gh/yangw-dev/27/head
2025-12-04T08:57:44.0455585Z  * [new branch]              gh/yangw-dev/27/orig        -> origin/gh/yangw-dev/27/orig
2025-12-04T08:57:44.0457680Z  * [new branch]              gh/ydwu4/292/base           -> origin/gh/ydwu4/292/base
2025-12-04T08:57:44.0458731Z  * [new branch]              gh/ydwu4/292/head           -> origin/gh/ydwu4/292/head
2025-12-04T08:57:44.0459803Z  * [new branch]              gh/ydwu4/292/orig           -> origin/gh/ydwu4/292/orig
2025-12-04T08:57:44.0461344Z  * [new branch]              gh/ydwu4/294/base           -> origin/gh/ydwu4/294/base
2025-12-04T08:57:44.0462400Z  * [new branch]              gh/ydwu4/294/head           -> origin/gh/ydwu4/294/head
2025-12-04T08:57:44.0463616Z  * [new branch]              gh/ydwu4/294/orig           -> origin/gh/ydwu4/294/orig
2025-12-04T08:57:44.0465289Z  * [new branch]              gh/ydwu4/295/base           -> origin/gh/ydwu4/295/base
2025-12-04T08:57:44.0466455Z  * [new branch]              gh/ydwu4/295/head           -> origin/gh/ydwu4/295/head
2025-12-04T08:57:44.0467551Z  * [new branch]              gh/ydwu4/295/orig           -> origin/gh/ydwu4/295/orig
2025-12-04T08:57:44.0469032Z  * [new branch]              gh/ydwu4/296/base           -> origin/gh/ydwu4/296/base
2025-12-04T08:57:44.0470107Z  * [new branch]              gh/ydwu4/296/head           -> origin/gh/ydwu4/296/head
2025-12-04T08:57:44.0471220Z  * [new branch]              gh/ydwu4/296/orig           -> origin/gh/ydwu4/296/orig
2025-12-04T08:57:44.0472746Z  * [new branch]              gh/ydwu4/306/base           -> origin/gh/ydwu4/306/base
2025-12-04T08:57:44.0473875Z  * [new branch]              gh/ydwu4/306/head           -> origin/gh/ydwu4/306/head
2025-12-04T08:57:44.0475509Z  * [new branch]              gh/ydwu4/306/orig           -> origin/gh/ydwu4/306/orig
2025-12-04T08:57:44.0477026Z  * [new branch]              gh/ydwu4/312/base           -> origin/gh/ydwu4/312/base
2025-12-04T08:57:44.0478100Z  * [new branch]              gh/ydwu4/312/head           -> origin/gh/ydwu4/312/head
2025-12-04T08:57:44.0479299Z  * [new branch]              gh/ydwu4/312/orig           -> origin/gh/ydwu4/312/orig
2025-12-04T08:57:44.0480679Z  * [new branch]              gh/ydwu4/322/base           -> origin/gh/ydwu4/322/base
2025-12-04T08:57:44.0481733Z  * [new branch]              gh/ydwu4/322/head           -> origin/gh/ydwu4/322/head
2025-12-04T08:57:44.0482794Z  * [new branch]              gh/ydwu4/322/orig           -> origin/gh/ydwu4/322/orig
2025-12-04T08:57:44.0484240Z  * [new branch]              gh/ydwu4/327/base           -> origin/gh/ydwu4/327/base
2025-12-04T08:57:44.0485368Z  * [new branch]              gh/ydwu4/327/head           -> origin/gh/ydwu4/327/head
2025-12-04T08:57:44.0486454Z  * [new branch]              gh/ydwu4/327/orig           -> origin/gh/ydwu4/327/orig
2025-12-04T08:57:44.0488031Z  * [new branch]              gh/ydwu4/328/base           -> origin/gh/ydwu4/328/base
2025-12-04T08:57:44.0489044Z  * [new branch]              gh/ydwu4/328/head           -> origin/gh/ydwu4/328/head
2025-12-04T08:57:44.0490104Z  * [new branch]              gh/ydwu4/328/orig           -> origin/gh/ydwu4/328/orig
2025-12-04T08:57:44.0491397Z  * [new branch]              gh/ydwu4/329/base           -> origin/gh/ydwu4/329/base
2025-12-04T08:57:44.0492464Z  * [new branch]              gh/ydwu4/329/head           -> origin/gh/ydwu4/329/head
2025-12-04T08:57:44.0493625Z  * [new branch]              gh/ydwu4/329/orig           -> origin/gh/ydwu4/329/orig
2025-12-04T08:57:44.0495152Z  * [new branch]              gh/ydwu4/330/base           -> origin/gh/ydwu4/330/base
2025-12-04T08:57:44.0496210Z  * [new branch]              gh/ydwu4/330/head           -> origin/gh/ydwu4/330/head
2025-12-04T08:57:44.0497650Z  * [new branch]              gh/ydwu4/330/orig           -> origin/gh/ydwu4/330/orig
2025-12-04T08:57:44.0499033Z  * [new branch]              gh/ydwu4/331/base           -> origin/gh/ydwu4/331/base
2025-12-04T08:57:44.0500158Z  * [new branch]              gh/ydwu4/331/head           -> origin/gh/ydwu4/331/head
2025-12-04T08:57:44.0501346Z  * [new branch]              gh/ydwu4/331/orig           -> origin/gh/ydwu4/331/orig
2025-12-04T08:57:44.0502623Z  * [new branch]              gh/ydwu4/332/base           -> origin/gh/ydwu4/332/base
2025-12-04T08:57:44.0503705Z  * [new branch]              gh/ydwu4/332/head           -> origin/gh/ydwu4/332/head
2025-12-04T08:57:44.0504840Z  * [new branch]              gh/ydwu4/332/orig           -> origin/gh/ydwu4/332/orig
2025-12-04T08:57:44.0506124Z  * [new branch]              gh/ydwu4/333/base           -> origin/gh/ydwu4/333/base
2025-12-04T08:57:44.0507217Z  * [new branch]              gh/ydwu4/333/head           -> origin/gh/ydwu4/333/head
2025-12-04T08:57:44.0508395Z  * [new branch]              gh/ydwu4/333/orig           -> origin/gh/ydwu4/333/orig
2025-12-04T08:57:44.0509825Z  * [new branch]              gh/ydwu4/334/base           -> origin/gh/ydwu4/334/base
2025-12-04T08:57:44.0510932Z  * [new branch]              gh/ydwu4/334/head           -> origin/gh/ydwu4/334/head
2025-12-04T08:57:44.0512059Z  * [new branch]              gh/ydwu4/334/orig           -> origin/gh/ydwu4/334/orig
2025-12-04T08:57:44.0513367Z  * [new branch]              gh/ydwu4/335/base           -> origin/gh/ydwu4/335/base
2025-12-04T08:57:44.0514594Z  * [new branch]              gh/ydwu4/335/head           -> origin/gh/ydwu4/335/head
2025-12-04T08:57:44.0515673Z  * [new branch]              gh/ydwu4/335/orig           -> origin/gh/ydwu4/335/orig
2025-12-04T08:57:44.0517513Z  * [new branch]              gh/ydwu4/337/base           -> origin/gh/ydwu4/337/base
2025-12-04T08:57:44.0518602Z  * [new branch]              gh/ydwu4/337/head           -> origin/gh/ydwu4/337/head
2025-12-04T08:57:44.0519708Z  * [new branch]              gh/ydwu4/337/orig           -> origin/gh/ydwu4/337/orig
2025-12-04T08:57:44.0521479Z  * [new branch]              gh/ydwu4/339/base           -> origin/gh/ydwu4/339/base
2025-12-04T08:57:44.0522736Z  * [new branch]              gh/ydwu4/339/head           -> origin/gh/ydwu4/339/head
2025-12-04T08:57:44.0523941Z  * [new branch]              gh/ydwu4/339/orig           -> origin/gh/ydwu4/339/orig
2025-12-04T08:57:44.0525881Z  * [new branch]              gh/yf225/133/base           -> origin/gh/yf225/133/base
2025-12-04T08:57:44.0526983Z  * [new branch]              gh/yf225/133/head           -> origin/gh/yf225/133/head
2025-12-04T08:57:44.0528464Z  * [new branch]              gh/yf225/93/base            -> origin/gh/yf225/93/base
2025-12-04T08:57:44.0529569Z  * [new branch]              gh/yf225/93/head            -> origin/gh/yf225/93/head
2025-12-04T08:57:44.0531943Z  * [new branch]              gh/yifuwang/152/base        -> origin/gh/yifuwang/152/base
2025-12-04T08:57:44.0533573Z  * [new branch]              gh/yifuwang/152/head        -> origin/gh/yifuwang/152/head
2025-12-04T08:57:44.0534729Z  * [new branch]              gh/yifuwang/152/orig        -> origin/gh/yifuwang/152/orig
2025-12-04T08:57:44.0536176Z  * [new branch]              gh/yifuwang/195/base        -> origin/gh/yifuwang/195/base
2025-12-04T08:57:44.0537702Z  * [new branch]              gh/yifuwang/195/head        -> origin/gh/yifuwang/195/head
2025-12-04T08:57:44.0538822Z  * [new branch]              gh/yifuwang/195/orig        -> origin/gh/yifuwang/195/orig
2025-12-04T08:57:44.0540673Z  * [new branch]              gh/yiming0416/1/base        -> origin/gh/yiming0416/1/base
2025-12-04T08:57:44.0541797Z  * [new branch]              gh/yiming0416/1/head        -> origin/gh/yiming0416/1/head
2025-12-04T08:57:44.0543161Z  * [new branch]              gh/yiming0416/2/base        -> origin/gh/yiming0416/2/base
2025-12-04T08:57:44.0544188Z  * [new branch]              gh/yiming0416/2/head        -> origin/gh/yiming0416/2/head
2025-12-04T08:57:44.0545986Z  * [new branch]              gh/yushangdi/1/base         -> origin/gh/yushangdi/1/base
2025-12-04T08:57:44.0547104Z  * [new branch]              gh/yushangdi/1/head         -> origin/gh/yushangdi/1/head
2025-12-04T08:57:44.0548765Z  * [new branch]              gh/yushangdi/10/base        -> origin/gh/yushangdi/10/base
2025-12-04T08:57:44.0549870Z  * [new branch]              gh/yushangdi/10/head        -> origin/gh/yushangdi/10/head
2025-12-04T08:57:44.0550997Z  * [new branch]              gh/yushangdi/10/orig        -> origin/gh/yushangdi/10/orig
2025-12-04T08:57:44.0552513Z  * [new branch]              gh/yushangdi/11/base        -> origin/gh/yushangdi/11/base
2025-12-04T08:57:44.0553585Z  * [new branch]              gh/yushangdi/11/head        -> origin/gh/yushangdi/11/head
2025-12-04T08:57:44.0554691Z  * [new branch]              gh/yushangdi/11/orig        -> origin/gh/yushangdi/11/orig
2025-12-04T08:57:44.0556021Z  * [new branch]              gh/yushangdi/2/base         -> origin/gh/yushangdi/2/base
2025-12-04T08:57:44.0557067Z  * [new branch]              gh/yushangdi/2/head         -> origin/gh/yushangdi/2/head
2025-12-04T08:57:44.0558536Z  * [new branch]              gh/yushangdi/7/base         -> origin/gh/yushangdi/7/base
2025-12-04T08:57:44.0559607Z  * [new branch]              gh/yushangdi/7/head         -> origin/gh/yushangdi/7/head
2025-12-04T08:57:44.0560683Z  * [new branch]              gh/yushangdi/7/orig         -> origin/gh/yushangdi/7/orig
2025-12-04T08:57:44.0562436Z  * [new branch]              gh/yushangdi/8/base         -> origin/gh/yushangdi/8/base
2025-12-04T08:57:44.0563719Z  * [new branch]              gh/yushangdi/8/head         -> origin/gh/yushangdi/8/head
2025-12-04T08:57:44.0564797Z  * [new branch]              gh/yushangdi/8/orig         -> origin/gh/yushangdi/8/orig
2025-12-04T08:57:44.0566233Z  * [new branch]              gh/yushangdi/9/base         -> origin/gh/yushangdi/9/base
2025-12-04T08:57:44.0567382Z  * [new branch]              gh/yushangdi/9/head         -> origin/gh/yushangdi/9/head
2025-12-04T08:57:44.0568467Z  * [new branch]              gh/yushangdi/9/orig         -> origin/gh/yushangdi/9/orig
2025-12-04T08:57:44.0570247Z  * [new branch]              gh/zklaus/19/base           -> origin/gh/zklaus/19/base
2025-12-04T08:57:44.0571351Z  * [new branch]              gh/zklaus/19/head           -> origin/gh/zklaus/19/head
2025-12-04T08:57:44.0572452Z  * [new branch]              gh/zklaus/19/orig           -> origin/gh/zklaus/19/orig
2025-12-04T08:57:44.0574352Z  * [new branch]              gh/zklaus/20/base           -> origin/gh/zklaus/20/base
2025-12-04T08:57:44.0575495Z  * [new branch]              gh/zklaus/20/head           -> origin/gh/zklaus/20/head
2025-12-04T08:57:44.0576833Z  * [new branch]              gh/zklaus/20/orig           -> origin/gh/zklaus/20/orig
2025-12-04T08:57:44.0578446Z  * [new branch]              gh/zklaus/21/base           -> origin/gh/zklaus/21/base
2025-12-04T08:57:44.0579559Z  * [new branch]              gh/zklaus/21/head           -> origin/gh/zklaus/21/head
2025-12-04T08:57:44.0580661Z  * [new branch]              gh/zklaus/21/orig           -> origin/gh/zklaus/21/orig
2025-12-04T08:57:44.0582211Z  * [new branch]              gh/zklaus/22/base           -> origin/gh/zklaus/22/base
2025-12-04T08:57:44.0583471Z  * [new branch]              gh/zklaus/22/head           -> origin/gh/zklaus/22/head
2025-12-04T08:57:44.0584550Z  * [new branch]              gh/zklaus/22/orig           -> origin/gh/zklaus/22/orig
2025-12-04T08:57:44.0586120Z  * [new branch]              gh/zklaus/23/base           -> origin/gh/zklaus/23/base
2025-12-04T08:57:44.0587214Z  * [new branch]              gh/zklaus/23/head           -> origin/gh/zklaus/23/head
2025-12-04T08:57:44.0588363Z  * [new branch]              gh/zklaus/23/orig           -> origin/gh/zklaus/23/orig
2025-12-04T08:57:44.0589798Z  * [new branch]              gh/zklaus/24/base           -> origin/gh/zklaus/24/base
2025-12-04T08:57:44.0590880Z  * [new branch]              gh/zklaus/24/head           -> origin/gh/zklaus/24/head
2025-12-04T08:57:44.0591958Z  * [new branch]              gh/zklaus/24/orig           -> origin/gh/zklaus/24/orig
2025-12-04T08:57:44.0593864Z  * [new branch]              gh/zou3519/1197/base        -> origin/gh/zou3519/1197/base
2025-12-04T08:57:44.0594979Z  * [new branch]              gh/zou3519/1197/head        -> origin/gh/zou3519/1197/head
2025-12-04T08:57:44.0596125Z  * [new branch]              gh/zou3519/1197/orig        -> origin/gh/zou3519/1197/orig
2025-12-04T08:57:44.0597883Z  * [new branch]              gh/zou3519/1199/base        -> origin/gh/zou3519/1199/base
2025-12-04T08:57:44.0599478Z  * [new branch]              gh/zou3519/1199/head        -> origin/gh/zou3519/1199/head
2025-12-04T08:57:44.0600570Z  * [new branch]              gh/zou3519/1199/orig        -> origin/gh/zou3519/1199/orig
2025-12-04T08:57:44.0602026Z  * [new branch]              gh/zou3519/1200/base        -> origin/gh/zou3519/1200/base
2025-12-04T08:57:44.0603134Z  * [new branch]              gh/zou3519/1200/head        -> origin/gh/zou3519/1200/head
2025-12-04T08:57:44.0604252Z  * [new branch]              gh/zou3519/1200/orig        -> origin/gh/zou3519/1200/orig
2025-12-04T08:57:44.0605746Z  * [new branch]              gh/zou3519/1201/base        -> origin/gh/zou3519/1201/base
2025-12-04T08:57:44.0606817Z  * [new branch]              gh/zou3519/1201/head        -> origin/gh/zou3519/1201/head
2025-12-04T08:57:44.0607878Z  * [new branch]              gh/zou3519/1201/orig        -> origin/gh/zou3519/1201/orig
2025-12-04T08:57:44.0609169Z  * [new branch]              gh/zou3519/1202/base        -> origin/gh/zou3519/1202/base
2025-12-04T08:57:44.0610240Z  * [new branch]              gh/zou3519/1202/head        -> origin/gh/zou3519/1202/head
2025-12-04T08:57:44.0611425Z  * [new branch]              gh/zou3519/1202/orig        -> origin/gh/zou3519/1202/orig
2025-12-04T08:57:44.0613217Z  * [new branch]              gh/zpcore/1/base            -> origin/gh/zpcore/1/base
2025-12-04T08:57:44.0614267Z  * [new branch]              gh/zpcore/1/head            -> origin/gh/zpcore/1/head
2025-12-04T08:57:44.0615789Z  * [new branch]              gh/zpcore/11/base           -> origin/gh/zpcore/11/base
2025-12-04T08:57:44.0617214Z  * [new branch]              gh/zpcore/11/head           -> origin/gh/zpcore/11/head
2025-12-04T08:57:44.0618354Z  * [new branch]              gh/zpcore/11/orig           -> origin/gh/zpcore/11/orig
2025-12-04T08:57:44.0620323Z  * [new branch]              gh/zpcore/12/base           -> origin/gh/zpcore/12/base
2025-12-04T08:57:44.0622190Z  * [new branch]              gh/zpcore/12/head           -> origin/gh/zpcore/12/head
2025-12-04T08:57:44.0623450Z  * [new branch]              gh/zpcore/12/orig           -> origin/gh/zpcore/12/orig
2025-12-04T08:57:44.0625045Z  * [new branch]              gh/zpcore/13/base           -> origin/gh/zpcore/13/base
2025-12-04T08:57:44.0626108Z  * [new branch]              gh/zpcore/13/head           -> origin/gh/zpcore/13/head
2025-12-04T08:57:44.0627307Z  * [new branch]              gh/zpcore/13/orig           -> origin/gh/zpcore/13/orig
2025-12-04T08:57:44.0628827Z  * [new branch]              gh/zpcore/14/base           -> origin/gh/zpcore/14/base
2025-12-04T08:57:44.0629971Z  * [new branch]              gh/zpcore/14/head           -> origin/gh/zpcore/14/head
2025-12-04T08:57:44.0631110Z  * [new branch]              gh/zpcore/14/orig           -> origin/gh/zpcore/14/orig
2025-12-04T08:57:44.0632945Z  * [new branch]              gh/zpcore/15/base           -> origin/gh/zpcore/15/base
2025-12-04T08:57:44.0634008Z  * [new branch]              gh/zpcore/15/head           -> origin/gh/zpcore/15/head
2025-12-04T08:57:44.0635135Z  * [new branch]              gh/zpcore/15/orig           -> origin/gh/zpcore/15/orig
2025-12-04T08:57:44.0636587Z  * [new branch]              gh/zpcore/2/base            -> origin/gh/zpcore/2/base
2025-12-04T08:57:44.0637682Z  * [new branch]              gh/zpcore/2/head            -> origin/gh/zpcore/2/head
2025-12-04T08:57:44.0639716Z  * [new branch]              gh/zpcore/21/base           -> origin/gh/zpcore/21/base
2025-12-04T08:57:44.0640904Z  * [new branch]              gh/zpcore/21/head           -> origin/gh/zpcore/21/head
2025-12-04T08:57:44.0642072Z  * [new branch]              gh/zpcore/21/orig           -> origin/gh/zpcore/21/orig
2025-12-04T08:57:44.0644010Z  * [new branch]              gh/zpcore/22/base           -> origin/gh/zpcore/22/base
2025-12-04T08:57:44.0644928Z  * [new branch]              gh/zpcore/22/head           -> origin/gh/zpcore/22/head
2025-12-04T08:57:44.0646136Z  * [new branch]              gh/zpcore/22/orig           -> origin/gh/zpcore/22/orig
2025-12-04T08:57:44.0647651Z  * [new branch]              gh/zpcore/23/base           -> origin/gh/zpcore/23/base
2025-12-04T08:57:44.0648723Z  * [new branch]              gh/zpcore/23/head           -> origin/gh/zpcore/23/head
2025-12-04T08:57:44.0649833Z  * [new branch]              gh/zpcore/23/orig           -> origin/gh/zpcore/23/orig
2025-12-04T08:57:44.0651274Z  * [new branch]              gh/zpcore/24/base           -> origin/gh/zpcore/24/base
2025-12-04T08:57:44.0652353Z  * [new branch]              gh/zpcore/24/head           -> origin/gh/zpcore/24/head
2025-12-04T08:57:44.0653424Z  * [new branch]              gh/zpcore/24/orig           -> origin/gh/zpcore/24/orig
2025-12-04T08:57:44.0655132Z  * [new branch]              gh/zpcore/25/base           -> origin/gh/zpcore/25/base
2025-12-04T08:57:44.0656208Z  * [new branch]              gh/zpcore/25/head           -> origin/gh/zpcore/25/head
2025-12-04T08:57:44.0657684Z  * [new branch]              gh/zpcore/25/orig           -> origin/gh/zpcore/25/orig
2025-12-04T08:57:44.0659322Z  * [new branch]              gh/zpcore/26/base           -> origin/gh/zpcore/26/base
2025-12-04T08:57:44.0660508Z  * [new branch]              gh/zpcore/26/head           -> origin/gh/zpcore/26/head
2025-12-04T08:57:44.0661657Z  * [new branch]              gh/zpcore/26/orig           -> origin/gh/zpcore/26/orig
2025-12-04T08:57:44.0663250Z  * [new branch]              gh/zpcore/27/base           -> origin/gh/zpcore/27/base
2025-12-04T08:57:44.0664361Z  * [new branch]              gh/zpcore/27/head           -> origin/gh/zpcore/27/head
2025-12-04T08:57:44.0665466Z  * [new branch]              gh/zpcore/27/orig           -> origin/gh/zpcore/27/orig
2025-12-04T08:57:44.0667554Z  * [new branch]              gh/zpcore/28/base           -> origin/gh/zpcore/28/base
2025-12-04T08:57:44.0669302Z  * [new branch]              gh/zpcore/28/head           -> origin/gh/zpcore/28/head
2025-12-04T08:57:44.0670412Z  * [new branch]              gh/zpcore/28/orig           -> origin/gh/zpcore/28/orig
2025-12-04T08:57:44.0671708Z  * [new branch]              gh/zpcore/3/base            -> origin/gh/zpcore/3/base
2025-12-04T08:57:44.0672756Z  * [new branch]              gh/zpcore/3/head            -> origin/gh/zpcore/3/head
2025-12-04T08:57:44.0674139Z  * [new branch]              gh/zpcore/4/base            -> origin/gh/zpcore/4/base
2025-12-04T08:57:44.0675171Z  * [new branch]              gh/zpcore/4/head            -> origin/gh/zpcore/4/head
2025-12-04T08:57:44.0676457Z  * [new branch]              gh/zpcore/5/base            -> origin/gh/zpcore/5/base
2025-12-04T08:57:44.0677495Z  * [new branch]              gh/zpcore/5/head            -> origin/gh/zpcore/5/head
2025-12-04T08:57:44.0678734Z  * [new branch]              gh/zpcore/6/base            -> origin/gh/zpcore/6/base
2025-12-04T08:57:44.0679763Z  * [new branch]              gh/zpcore/6/head            -> origin/gh/zpcore/6/head
2025-12-04T08:57:44.0681472Z  * [new branch]              gh/zpcore/7/base            -> origin/gh/zpcore/7/base
2025-12-04T08:57:44.0682525Z  * [new branch]              gh/zpcore/7/head            -> origin/gh/zpcore/7/head
2025-12-04T08:57:44.0683851Z  * [new branch]              gh/zpcore/8/base            -> origin/gh/zpcore/8/base
2025-12-04T08:57:44.0684937Z  * [new branch]              gh/zpcore/8/head            -> origin/gh/zpcore/8/head
2025-12-04T08:57:44.0686203Z  * [new branch]              google-main                 -> origin/google-main
2025-12-04T08:57:44.0687847Z  * [new branch]              guangyey/external_stream    -> origin/guangyey/external_stream
2025-12-04T08:57:44.0688809Z  * [new branch]              guangyey/test_2025          -> origin/guangyey/test_2025
2025-12-04T08:57:44.0690539Z  * [new branch]              guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T08:57:44.0691922Z  * [new branch]              hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T08:57:44.0693161Z  * [new branch]              hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T08:57:44.0694040Z  * [new branch]              hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T08:57:44.0695094Z  * [new branch]              hc_baseline                 -> origin/hc_baseline
2025-12-04T08:57:44.0696908Z  * [new branch]              hhh_rand                    -> origin/hhh_rand
2025-12-04T08:57:44.0698526Z  * [new branch]              huba/f1                     -> origin/huba/f1
2025-12-04T08:57:44.0700366Z  * [new branch]              increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T08:57:44.0701056Z  * [new branch]              inlining                    -> origin/inlining
2025-12-04T08:57:44.0702298Z  * [new branch]              inlining-ezyang             -> origin/inlining-ezyang
2025-12-04T08:57:44.0703517Z  * [new branch]              install-torchao-0.13.0      -> origin/install-torchao-0.13.0
2025-12-04T08:57:44.0705121Z  * [new branch]              instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T08:57:44.0705888Z  * [new branch]              invoke-subgraph             -> origin/invoke-subgraph
2025-12-04T08:57:44.0707113Z  * [new branch]              issue#58739                 -> origin/issue#58739
2025-12-04T08:57:44.0708406Z  * [new branch]              jainapurva-patch-1          -> origin/jainapurva-patch-1
2025-12-04T08:57:44.0709897Z  * [new branch]              jathu/o3                    -> origin/jathu/o3
2025-12-04T08:57:44.0710916Z  * [new branch]              jathu/sve                   -> origin/jathu/sve
2025-12-04T08:57:44.0712637Z  * [new branch]              jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T08:57:44.0713678Z  * [new branch]              jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T08:57:44.0715101Z  * [new branch]              jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T08:57:44.0716193Z  * [new branch]              jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T08:57:44.0717607Z  * [new branch]              jithunnair-amd-patch-1      -> origin/jithunnair-amd-patch-1
2025-12-04T08:57:44.0718779Z  * [new branch]              jithunnair-amd-patch-10     -> origin/jithunnair-amd-patch-10
2025-12-04T08:57:44.0719934Z  * [new branch]              jithunnair-amd-patch-2      -> origin/jithunnair-amd-patch-2
2025-12-04T08:57:44.0721629Z  * [new branch]              jithunnair-amd-patch-3      -> origin/jithunnair-amd-patch-3
2025-12-04T08:57:44.0722926Z  * [new branch]              jithunnair-amd-patch-4      -> origin/jithunnair-amd-patch-4
2025-12-04T08:57:44.0724029Z  * [new branch]              jithunnair-amd-patch-5      -> origin/jithunnair-amd-patch-5
2025-12-04T08:57:44.0725287Z  * [new branch]              jithunnair-amd-patch-6      -> origin/jithunnair-amd-patch-6
2025-12-04T08:57:44.0726415Z  * [new branch]              jithunnair-amd-patch-7      -> origin/jithunnair-amd-patch-7
2025-12-04T08:57:44.0727722Z  * [new branch]              jithunnair-amd-patch-8      -> origin/jithunnair-amd-patch-8
2025-12-04T08:57:44.0728919Z  * [new branch]              jithunnair-amd-patch-9      -> origin/jithunnair-amd-patch-9
2025-12-04T08:57:44.0730511Z  * [new branch]              justinchu/native-qdq        -> origin/justinchu/native-qdq
2025-12-04T08:57:44.0731949Z  * [new branch]              kainan666/xlf_debug         -> origin/kainan666/xlf_debug
2025-12-04T08:57:44.0733199Z  * [new branch]              kainan_test                 -> origin/kainan_test
2025-12-04T08:57:44.0734372Z  * [new branch]              larryliu0820-patch-1        -> origin/larryliu0820-patch-1
2025-12-04T08:57:44.0736441Z  * [new branch]              leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T08:57:44.0738274Z  * [new branch]              lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T08:57:44.0739642Z  * [new branch]              liaoxuan/shm_all_reduce     -> origin/liaoxuan/shm_all_reduce
2025-12-04T08:57:44.0740776Z  * [new branch]              liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T08:57:44.0741800Z  * [new branch]              liaoxuan/test_int8_sdpa     -> origin/liaoxuan/test_int8_sdpa
2025-12-04T08:57:44.0742865Z  * [new branch]              llama4-stable               -> origin/llama4-stable
2025-12-04T08:57:44.0744754Z  * [new branch]              lts/release/1.8             -> origin/lts/release/1.8
2025-12-04T08:57:44.0746257Z  * [new branch]              lucaskabela/#94773          -> origin/lucaskabela/#94773
2025-12-04T08:57:44.0747337Z  * [new branch]              lucaskabela/fix_164876      -> origin/lucaskabela/fix_164876
2025-12-04T08:57:44.0748454Z  * [new branch]              lucaskabela/flop_counter    -> origin/lucaskabela/flop_counter
2025-12-04T08:57:44.0749633Z  * [new branch]              lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T08:57:44.0750698Z  * [new branch]              lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T08:57:44.0751820Z  * [new branch]              lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T08:57:44.0753229Z  * [new branch]              lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T08:57:44.0754743Z  * [new branch]              lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T08:57:44.0755714Z  * [new branch]              lucaskabela/rnn_decomp      -> origin/lucaskabela/rnn_decomp
2025-12-04T08:57:44.0756801Z  * [new branch]              lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T08:57:44.0757942Z  * [new branch]              lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T08:57:44.0759003Z  * [new branch]              lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T08:57:44.0760112Z  * [new branch]              lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T08:57:44.0761176Z  * [new branch]              lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T08:57:44.0762290Z  * [new branch]              lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T08:57:44.0763415Z  * [new branch]              lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T08:57:44.0764372Z  * [new branch]              lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T08:57:44.0765930Z  * [new branch]              lw/torch_box_by_ref         -> origin/lw/torch_box_by_ref
2025-12-04T08:57:44.0767144Z  * [new branch]              main                        -> origin/main
2025-12-04T08:57:44.0768431Z  * [new branch]              malfet-patch-1              -> origin/malfet-patch-1
2025-12-04T08:57:44.0769680Z  * [new branch]              malfet-patch-2              -> origin/malfet-patch-2
2025-12-04T08:57:44.0770956Z  * [new branch]              malfet-patch-3              -> origin/malfet-patch-3
2025-12-04T08:57:44.0772227Z  * [new branch]              malfet-patch-4              -> origin/malfet-patch-4
2025-12-04T08:57:44.0773429Z  * [new branch]              malfet-patch-5              -> origin/malfet-patch-5
2025-12-04T08:57:44.0774645Z  * [new branch]              malfet-patch-6              -> origin/malfet-patch-6
2025-12-04T08:57:44.0775774Z  * [new branch]              malfet-patch-7              -> origin/malfet-patch-7
2025-12-04T08:57:44.0777367Z  * [new branch]              malfet-patch-8              -> origin/malfet-patch-8
2025-12-04T08:57:44.0778981Z  * [new branch]              malfet/add-3.14-ci          -> origin/malfet/add-3.14-ci
2025-12-04T08:57:44.0780357Z  * [new branch]              malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T08:57:44.0781465Z  * [new branch]              malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T08:57:44.0782851Z  * [new branch]              malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T08:57:44.0784111Z  * [new branch]              malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T08:57:44.0785749Z  * [new branch]              manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T08:57:44.0786682Z  * [new branch]              manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T08:57:44.0788298Z  * [new branch]              masnesral/metaconda         -> origin/masnesral/metaconda
2025-12-04T08:57:44.0789590Z  * [new branch]              mem_profiler_flaky_fix      -> origin/mem_profiler_flaky_fix
2025-12-04T08:57:44.0790740Z  * [new branch]              mem_profiler_stack_trace    -> origin/mem_profiler_stack_trace
2025-12-04T08:57:44.0791897Z  * [new branch]              memory_profiler_stack       -> origin/memory_profiler_stack
2025-12-04T08:57:44.0793764Z  * [new branch]              metascroy-patch-1           -> origin/metascroy-patch-1
2025-12-04T08:57:44.0794913Z  * [new branch]              mingw_posix                 -> origin/mingw_posix
2025-12-04T08:57:44.0796442Z  * [new branch]              mlazos/S429861-debug        -> origin/mlazos/S429861-debug
2025-12-04T08:57:44.0797450Z  * [new branch]              mlazos/aa                   -> origin/mlazos/aa
2025-12-04T08:57:44.0798508Z  * [new branch]              mlazos/acts                 -> origin/mlazos/acts
2025-12-04T08:57:44.0799613Z  * [new branch]              mlazos/arg-renames          -> origin/mlazos/arg-renames
2025-12-04T08:57:44.0800647Z  * [new branch]              mlazos/bad-cudagraphs       -> origin/mlazos/bad-cudagraphs
2025-12-04T08:57:44.0801732Z  * [new branch]              mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T08:57:44.0802777Z  * [new branch]              mlazos/beta-tensor          -> origin/mlazos/beta-tensor
2025-12-04T08:57:44.0803771Z  * [new branch]              mlazos/buffers              -> origin/mlazos/buffers
2025-12-04T08:57:44.0804692Z  * [new branch]              mlazos/buffers2             -> origin/mlazos/buffers2
2025-12-04T08:57:44.0806145Z  * [new branch]              mlazos/buffers3             -> origin/mlazos/buffers3
2025-12-04T08:57:44.0807420Z  * [new branch]              mlazos/bwd                  -> origin/mlazos/bwd
2025-12-04T08:57:44.0808865Z  * [new branch]              mlazos/combo-test           -> origin/mlazos/combo-test
2025-12-04T08:57:44.0810002Z  * [new branch]              mlazos/ctx-cleanup          -> origin/mlazos/ctx-cleanup
2025-12-04T08:57:44.0811096Z  * [new branch]              mlazos/cuda-cmd-log         -> origin/mlazos/cuda-cmd-log
2025-12-04T08:57:44.0812300Z  * [new branch]              mlazos/cudagraph-tests      -> origin/mlazos/cudagraph-tests
2025-12-04T08:57:44.0813467Z  * [new branch]              mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T08:57:44.0814550Z  * [new branch]              mlazos/cutlass-test         -> origin/mlazos/cutlass-test
2025-12-04T08:57:44.0815657Z  * [new branch]              mlazos/cutlass-topo-bug     -> origin/mlazos/cutlass-topo-bug
2025-12-04T08:57:44.0817151Z  * [new branch]              mlazos/dataclass-proxy      -> origin/mlazos/dataclass-proxy
2025-12-04T08:57:44.0818125Z  * [new branch]              mlazos/dc-attrs             -> origin/mlazos/dc-attrs
2025-12-04T08:57:44.0819355Z  * [new branch]              mlazos/dc-helion            -> origin/mlazos/dc-helion
2025-12-04T08:57:44.0820499Z  * [new branch]              mlazos/dict-fix             -> origin/mlazos/dict-fix
2025-12-04T08:57:44.0824264Z  * [new branch]              mlazos/disable-tf           -> origin/mlazos/disable-tf
2025-12-04T08:57:44.0825405Z  * [new branch]              mlazos/dupe-fix             -> origin/mlazos/dupe-fix
2025-12-04T08:57:44.0826611Z  * [new branch]              mlazos/dyn-batch            -> origin/mlazos/dyn-batch
2025-12-04T08:57:44.0827789Z  * [new branch]              mlazos/evt                  -> origin/mlazos/evt
2025-12-04T08:57:44.0829025Z  * [new branch]              mlazos/extract-examples     -> origin/mlazos/extract-examples
2025-12-04T08:57:44.0830103Z  * [new branch]              mlazos/foreach-op           -> origin/mlazos/foreach-op
2025-12-04T08:57:44.0831152Z  * [new branch]              mlazos/fp8                  -> origin/mlazos/fp8
2025-12-04T08:57:44.0832304Z  * [new branch]              mlazos/fp8-bias             -> origin/mlazos/fp8-bias
2025-12-04T08:57:44.0833539Z  * [new branch]              mlazos/fp8-bias-fusion      -> origin/mlazos/fp8-bias-fusion
2025-12-04T08:57:44.0834589Z  * [new branch]              mlazos/fp8-fixes            -> origin/mlazos/fp8-fixes
2025-12-04T08:57:44.0835678Z  * [new branch]              mlazos/freezing             -> origin/mlazos/freezing
2025-12-04T08:57:44.0836731Z  * [new branch]              mlazos/h-comp               -> origin/mlazos/h-comp
2025-12-04T08:57:44.0837873Z  * [new branch]              mlazos/h-comp2              -> origin/mlazos/h-comp2
2025-12-04T08:57:44.0839113Z  * [new branch]              mlazos/hash-hop             -> origin/mlazos/hash-hop
2025-12-04T08:57:44.0840247Z  * [new branch]              mlazos/hc                   -> origin/mlazos/hc
2025-12-04T08:57:44.0841361Z  * [new branch]              mlazos/hc-cycles            -> origin/mlazos/hc-cycles
2025-12-04T08:57:44.0842452Z  * [new branch]              mlazos/hc-fixes             -> origin/mlazos/hc-fixes
2025-12-04T08:57:44.0843510Z  * [new branch]              mlazos/hc-fixes3            -> origin/mlazos/hc-fixes3
2025-12-04T08:57:44.0844800Z  * [new branch]              mlazos/hc-fixes4            -> origin/mlazos/hc-fixes4
2025-12-04T08:57:44.0845986Z  * [new branch]              mlazos/hc-hf                -> origin/mlazos/hc-hf
2025-12-04T08:57:44.0847062Z  * [new branch]              mlazos/hc-mut               -> origin/mlazos/hc-mut
2025-12-04T08:57:44.0848142Z  * [new branch]              mlazos/hc10                 -> origin/mlazos/hc10
2025-12-04T08:57:44.0849228Z  * [new branch]              mlazos/hc11                 -> origin/mlazos/hc11
2025-12-04T08:57:44.0850299Z  * [new branch]              mlazos/hc12                 -> origin/mlazos/hc12
2025-12-04T08:57:44.0851353Z  * [new branch]              mlazos/hc13                 -> origin/mlazos/hc13
2025-12-04T08:57:44.0852420Z  * [new branch]              mlazos/hc14                 -> origin/mlazos/hc14
2025-12-04T08:57:44.0853515Z  * [new branch]              mlazos/hc15                 -> origin/mlazos/hc15
2025-12-04T08:57:44.0854597Z  * [new branch]              mlazos/hc2                  -> origin/mlazos/hc2
2025-12-04T08:57:44.0855674Z  * [new branch]              mlazos/hc4                  -> origin/mlazos/hc4
2025-12-04T08:57:44.0857047Z  * [new branch]              mlazos/hc5                  -> origin/mlazos/hc5
2025-12-04T08:57:44.0858216Z  * [new branch]              mlazos/hc6                  -> origin/mlazos/hc6
2025-12-04T08:57:44.0859335Z  * [new branch]              mlazos/hc7                  -> origin/mlazos/hc7
2025-12-04T08:57:44.0860336Z  * [new branch]              mlazos/hc8                  -> origin/mlazos/hc8
2025-12-04T08:57:44.0861569Z  * [new branch]              mlazos/hc9                  -> origin/mlazos/hc9
2025-12-04T08:57:44.0862553Z  * [new branch]              mlazos/hc_baseline2         -> origin/mlazos/hc_baseline2
2025-12-04T08:57:44.0863761Z  * [new branch]              mlazos/inductor-streams     -> origin/mlazos/inductor-streams
2025-12-04T08:57:44.0865096Z  * [new branch]              mlazos/main                 -> origin/mlazos/main
2025-12-04T08:57:44.0866253Z  * [new branch]              mlazos/mcg2                 -> origin/mlazos/mcg2
2025-12-04T08:57:44.0867438Z  * [new branch]              mlazos/meta-guards          -> origin/mlazos/meta-guards
2025-12-04T08:57:44.0869154Z  * [new branch]              mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T08:57:44.0870252Z  * [new branch]              mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T08:57:44.0871276Z  * [new branch]              mlazos/mod-fix              -> origin/mlazos/mod-fix
2025-12-04T08:57:44.0872422Z  * [new branch]              mlazos/mode-fix             -> origin/mlazos/mode-fix
2025-12-04T08:57:44.0873518Z  * [new branch]              mlazos/offsets              -> origin/mlazos/offsets
2025-12-04T08:57:44.0874516Z  * [new branch]              mlazos/overguarding         -> origin/mlazos/overguarding
2025-12-04T08:57:44.0875645Z  * [new branch]              mlazos/proxy-ctors          -> origin/mlazos/proxy-ctors
2025-12-04T08:57:44.0876723Z  * [new branch]              mlazos/quant-fix            -> origin/mlazos/quant-fix
2025-12-04T08:57:44.0877791Z  * [new branch]              mlazos/resnet-fix           -> origin/mlazos/resnet-fix
2025-12-04T08:57:44.0878928Z  * [new branch]              mlazos/rm-buf-names         -> origin/mlazos/rm-buf-names
2025-12-04T08:57:44.0880020Z  * [new branch]              mlazos/rm-code              -> origin/mlazos/rm-code
2025-12-04T08:57:44.0881230Z  * [new branch]              mlazos/rm-spam              -> origin/mlazos/rm-spam
2025-12-04T08:57:44.0882285Z  * [new branch]              mlazos/rtp                  -> origin/mlazos/rtp
2025-12-04T08:57:44.0883448Z  * [new branch]              mlazos/static-idx-dbg       -> origin/mlazos/static-idx-dbg
2025-12-04T08:57:44.0884572Z  * [new branch]              mlazos/static-inputs-log    -> origin/mlazos/static-inputs-log
2025-12-04T08:57:44.0885508Z  * [new branch]              mlazos/stests               -> origin/mlazos/stests
2025-12-04T08:57:44.0886978Z  * [new branch]              mlazos/stream-ops           -> origin/mlazos/stream-ops
2025-12-04T08:57:44.0888040Z  * [new branch]              mlazos/td-fix2              -> origin/mlazos/td-fix2
2025-12-04T08:57:44.0889186Z  * [new branch]              mlazos/tensor-hasattr2      -> origin/mlazos/tensor-hasattr2
2025-12-04T08:57:44.0890224Z  * [new branch]              mlazos/test                 -> origin/mlazos/test
2025-12-04T08:57:44.0891387Z  * [new branch]              mlazos/tf-mode              -> origin/mlazos/tf-mode
2025-12-04T08:57:44.0892550Z  * [new branch]              mlazos/tf-mode-backup2      -> origin/mlazos/tf-mode-backup2
2025-12-04T08:57:44.0893625Z  * [new branch]              mlazos/tf-mode-reland       -> origin/mlazos/tf-mode-reland
2025-12-04T08:57:44.0894826Z  * [new branch]              mlazos/tf-mode-reland2      -> origin/mlazos/tf-mode-reland2
2025-12-04T08:57:44.0895865Z  * [new branch]              mlazos/tf-mode-reland3      -> origin/mlazos/tf-mode-reland3
2025-12-04T08:57:44.0897286Z  * [new branch]              mlazos/triton-no-epi        -> origin/mlazos/triton-no-epi
2025-12-04T08:57:44.0898406Z  * [new branch]              mlazos/tune-proto           -> origin/mlazos/tune-proto
2025-12-04T08:57:44.0899660Z  * [new branch]              mlazos/tuple-fixes          -> origin/mlazos/tuple-fixes
2025-12-04T08:57:44.0900784Z  * [new branch]              mlazos/tuple-fixes2         -> origin/mlazos/tuple-fixes2
2025-12-04T08:57:44.0902156Z  * [new branch]              mlazos/tuple-handling       -> origin/mlazos/tuple-handling
2025-12-04T08:57:44.0903182Z  * [new branch]              mlazos/user-stream-base     -> origin/mlazos/user-stream-base
2025-12-04T08:57:44.0904261Z  * [new branch]              mlazos/user-streams         -> origin/mlazos/user-streams
2025-12-04T08:57:44.0905431Z  * [new branch]              mlazos/user-streams-backup  -> origin/mlazos/user-streams-backup
2025-12-04T08:57:44.0906602Z  * [new branch]              mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T08:57:44.0907661Z  * [new branch]              mlazos/vary-beta            -> origin/mlazos/vary-beta
2025-12-04T08:57:44.0908894Z  * [new branch]              mlazos/vary-beta2           -> origin/mlazos/vary-beta2
2025-12-04T08:57:44.0909989Z  * [new branch]              mlazos/weird-perf1          -> origin/mlazos/weird-perf1
2025-12-04T08:57:44.0911172Z  * [new branch]              mm_out_dtype_compile        -> origin/mm_out_dtype_compile
2025-12-04T08:57:44.0912291Z  * [new branch]              module-shim                 -> origin/module-shim
2025-12-04T08:57:44.0913429Z  * [new branch]              move_config                 -> origin/move_config
2025-12-04T08:57:44.0914857Z  * [new branch]              msaroufim/reduce            -> origin/msaroufim/reduce
2025-12-04T08:57:44.0916351Z  * [new branch]              mtia/basic-cmake            -> origin/mtia/basic-cmake
2025-12-04T08:57:44.0917869Z  * [new branch]              mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T08:57:44.0918931Z  * [new branch]              my_varlen_backup            -> origin/my_varlen_backup
2025-12-04T08:57:44.0920089Z  * [new branch]              nativert_num_outputs        -> origin/nativert_num_outputs
2025-12-04T08:57:44.0921563Z  * [new branch]              new-codegen                 -> origin/new-codegen
2025-12-04T08:57:44.0922905Z  * [new branch]              newtest-base                -> origin/newtest-base
2025-12-04T08:57:44.0924498Z  * [new branch]              ngimel/addmm_dtype          -> origin/ngimel/addmm_dtype
2025-12-04T08:57:44.0925542Z  * [new branch]              ngimel/div_inv              -> origin/ngimel/div_inv
2025-12-04T08:57:44.0926622Z  * [new branch]              ngimel/error_index_list     -> origin/ngimel/error_index_list
2025-12-04T08:57:44.0927644Z  * [new branch]              ngimel/gather_grid          -> origin/ngimel/gather_grid
2025-12-04T08:57:44.0928825Z  * [new branch]              ngimel/gather_grid_release  -> origin/ngimel/gather_grid_release
2025-12-04T08:57:44.0929783Z  * [new branch]              ngimel/gg_new               -> origin/ngimel/gg_new
2025-12-04T08:57:44.0930860Z  * [new branch]              ngimel/hostalloc            -> origin/ngimel/hostalloc
2025-12-04T08:57:44.0931911Z  * [new branch]              ngimel/storage_id           -> origin/ngimel/storage_id
2025-12-04T08:57:44.0933286Z  * [new branch]              nightly                     -> origin/nightly
2025-12-04T08:57:44.0934963Z  * [new branch]              nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T08:57:44.0936592Z  * [new branch]              nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T08:57:44.0938026Z  * [new branch]              nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T08:57:44.0939293Z  * [new branch]              nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T08:57:44.0940602Z  * [new branch]              nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T08:57:44.0942001Z  * [new branch]              nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T08:57:44.0943025Z  * [new branch]              nikitaved/test              -> origin/nikitaved/test
2025-12-04T08:57:44.0944759Z  * [new branch]              nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T08:57:44.0945667Z  * [new branch]              no_distributed_log_spew     -> origin/no_distributed_log_spew
2025-12-04T08:57:44.0946888Z  * [new branch]              nofun-hack                  -> origin/nofun-hack
2025-12-04T08:57:44.0948138Z  * [new branch]              norm_bench                  -> origin/norm_bench
2025-12-04T08:57:44.0949721Z  * [new branch]              nullplay/fuse_matmul        -> origin/nullplay/fuse_matmul
2025-12-04T08:57:44.0950840Z  * [new branch]              nullplay_fuse_matmul        -> origin/nullplay_fuse_matmul
2025-12-04T08:57:44.0952076Z  * [new branch]              optimizer_test              -> origin/optimizer_test
2025-12-04T08:57:44.0953859Z  * [new branch]              orig/release/1.10           -> origin/orig/release/1.10
2025-12-04T08:57:44.0955014Z  * [new branch]              orig/release/1.11           -> origin/orig/release/1.11
2025-12-04T08:57:44.0956152Z  * [new branch]              orig/release/1.12           -> origin/orig/release/1.12
2025-12-04T08:57:44.0957443Z  * [new branch]              orig/release/1.13           -> origin/orig/release/1.13
2025-12-04T08:57:44.0958616Z  * [new branch]              orig/release/1.6            -> origin/orig/release/1.6
2025-12-04T08:57:44.0959902Z  * [new branch]              orig/release/1.7            -> origin/orig/release/1.7
2025-12-04T08:57:44.0961041Z  * [new branch]              orig/release/1.8            -> origin/orig/release/1.8
2025-12-04T08:57:44.0962183Z  * [new branch]              orig/release/1.9            -> origin/orig/release/1.9
2025-12-04T08:57:44.0963292Z  * [new branch]              orig/release/2.0            -> origin/orig/release/2.0
2025-12-04T08:57:44.0964414Z  * [new branch]              orig/release/2.1            -> origin/orig/release/2.1
2025-12-04T08:57:44.0965519Z  * [new branch]              orig/release/2.2            -> origin/orig/release/2.2
2025-12-04T08:57:44.0966719Z  * [new branch]              orig/release/2.3            -> origin/orig/release/2.3
2025-12-04T08:57:44.0967803Z  * [new branch]              orig/release/2.4            -> origin/orig/release/2.4
2025-12-04T08:57:44.0968960Z  * [new branch]              orig/release/2.5            -> origin/orig/release/2.5
2025-12-04T08:57:44.0970013Z  * [new branch]              orig/release/2.6            -> origin/orig/release/2.6
2025-12-04T08:57:44.0971404Z  * [new branch]              orig/release/2.7            -> origin/orig/release/2.7
2025-12-04T08:57:44.0972915Z  * [new branch]              orig/release/2.8            -> origin/orig/release/2.8
2025-12-04T08:57:44.0974347Z  * [new branch]              orig/release/2.9            -> origin/orig/release/2.9
2025-12-04T08:57:44.0976978Z  * [new branch]              origin/gh/fxdawnn/1/base    -> origin/origin/gh/fxdawnn/1/base
2025-12-04T08:57:44.0978095Z  * [new branch]              origin/gh/fxdawnn/1/orig    -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T08:57:44.0980374Z  * [new branch]              origin/gh/zpcore/14/orig    -> origin/origin/gh/zpcore/14/orig
2025-12-04T08:57:44.0981750Z  * [new branch]              oulgen-patch-1              -> origin/oulgen-patch-1
2025-12-04T08:57:44.0983044Z  * [new branch]              oulgen-patch-2              -> origin/oulgen-patch-2
2025-12-04T08:57:44.0984373Z  * [new branch]              oulgen-patch-3              -> origin/oulgen-patch-3
2025-12-04T08:57:44.0985652Z  * [new branch]              oulgen-patch-4              -> origin/oulgen-patch-4
2025-12-04T08:57:44.0986826Z  * [new branch]              padded-tensor               -> origin/padded-tensor
2025-12-04T08:57:44.0988033Z  * [new branch]              pca2                        -> origin/pca2
2025-12-04T08:57:44.0989438Z  * [new branch]              per_channel_backup          -> origin/per_channel_backup
2025-12-04T08:57:44.0990592Z  * [new branch]              perf_ops                    -> origin/perf_ops
2025-12-04T08:57:44.0991778Z  * [new branch]              perf_ops_2_9                -> origin/perf_ops_2_9
2025-12-04T08:57:44.0993125Z  * [new branch]              pianpwk-patch-1             -> origin/pianpwk-patch-1
2025-12-04T08:57:44.0994518Z  * [new branch]              pianpwk/__draft_debug_mode  -> origin/pianpwk/__draft_debug_mode
2025-12-04T08:57:44.0995655Z  * [new branch]              pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T08:57:44.0996560Z  * [new branch]              pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T08:57:44.0997606Z  * [new branch]              pianpwk/_draft_triton_11_3  -> origin/pianpwk/_draft_triton_11_3
2025-12-04T08:57:44.0998754Z  * [new branch]              pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T08:57:44.1000054Z  * [new branch]              pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T08:57:44.1001821Z  * [new branch]              pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T08:57:44.1003147Z  * [new branch]              pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T08:57:44.1004187Z  * [new branch]              pianpwk/anomaly_tb          -> origin/pianpwk/anomaly_tb
2025-12-04T08:57:44.1005261Z  * [new branch]              pianpwk/auto_fx_annotate    -> origin/pianpwk/auto_fx_annotate
2025-12-04T08:57:44.1006442Z  * [new branch]              pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T08:57:44.1007464Z  * [new branch]              pianpwk/bert_dynamic_perf   -> origin/pianpwk/bert_dynamic_perf
2025-12-04T08:57:44.1008715Z  * [new branch]              pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T08:57:44.1009826Z  * [new branch]              pianpwk/debug_hash_tensor   -> origin/pianpwk/debug_hash_tensor
2025-12-04T08:57:44.1010968Z  * [new branch]              pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T08:57:44.1011995Z  * [new branch]              pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T08:57:44.1013112Z  * [new branch]              pianpwk/debug_mode_hacks    -> origin/pianpwk/debug_mode_hacks
2025-12-04T08:57:44.1014244Z  * [new branch]              pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T08:57:44.1015272Z  * [new branch]              pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T08:57:44.1016519Z  * [new branch]              pianpwk/debug_mode_triton   -> origin/pianpwk/debug_mode_triton
2025-12-04T08:57:44.1018059Z  * [new branch]              pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T08:57:44.1019177Z  * [new branch]              pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T08:57:44.1020287Z  * [new branch]              pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T08:57:44.1021931Z  * [new branch]              pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T08:57:44.1023028Z  * [new branch]              pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T08:57:44.1024129Z  * [new branch]              pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T08:57:44.1025301Z  * [new branch]              pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T08:57:44.1026435Z  * [new branch]              pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T08:57:44.1027717Z  * [new branch]              pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T08:57:44.1028916Z  * [new branch]              pianpwk/event_list_tree     -> origin/pianpwk/event_list_tree
2025-12-04T08:57:44.1030215Z  * [new branch]              pianpwk/false_numel_refs    -> origin/pianpwk/false_numel_refs
2025-12-04T08:57:44.1031353Z  * [new branch]              pianpwk/maybe_guard_rel     -> origin/pianpwk/maybe_guard_rel
2025-12-04T08:57:44.1032689Z  * [new branch]              pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T08:57:44.1033859Z  * [new branch]              pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T08:57:44.1034975Z  * [new branch]              pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T08:57:44.1035978Z  * [new branch]              pianpwk/pre_forward_hook    -> origin/pianpwk/pre_forward_hook
2025-12-04T08:57:44.1037135Z  * [new branch]              pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T08:57:44.1038199Z  * [new branch]              pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T08:57:44.1039214Z  * [new branch]              pianpwk/sym_tokens_draft    -> origin/pianpwk/sym_tokens_draft
2025-12-04T08:57:44.1040435Z  * [new branch]              pianpwk/symint_one_hot      -> origin/pianpwk/symint_one_hot
2025-12-04T08:57:44.1041669Z  * [new branch]              pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T08:57:44.1043100Z  * [new branch]              pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T08:57:44.1044128Z  * [new branch]              pianpwk/try_dumb_stuff      -> origin/pianpwk/try_dumb_stuff
2025-12-04T08:57:44.1045218Z  * [new branch]              pianpwk/try_dumb_stuff_2    -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T08:57:44.1046360Z  * [new branch]              pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T08:57:44.1047442Z  * [new branch]              pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T08:57:44.1048433Z  * [new branch]              pianpwk/user_symints        -> origin/pianpwk/user_symints
2025-12-04T08:57:44.1049565Z  * [new branch]              pianpwk/wan21_reshape       -> origin/pianpwk/wan21_reshape
2025-12-04T08:57:44.1051055Z  * [new branch]              piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T08:57:44.1051998Z  * [new branch]              piz/prop_cache_clean        -> origin/piz/prop_cache_clean
2025-12-04T08:57:44.1053238Z  * [new branch]              pool-separate               -> origin/pool-separate
2025-12-04T08:57:44.1054369Z  * [new branch]              pr-156087                   -> origin/pr-156087
2025-12-04T08:57:44.1055928Z  * [new branch]              pr/131860                   -> origin/pr/131860
2025-12-04T08:57:44.1057977Z  * [new branch]              predispatch_to              -> origin/predispatch_to
2025-12-04T08:57:44.1059135Z  * [new branch]              protect-c17                 -> origin/protect-c17
2025-12-04T08:57:44.1060367Z  * [new branch]              pt-opt-cuda3                -> origin/pt-opt-cuda3
2025-12-04T08:57:44.1062077Z  * [new branch]              python_compiled_autograd    -> origin/python_compiled_autograd
2025-12-04T08:57:44.1063788Z  * [new branch]              q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T08:57:44.1064972Z  * [new branch]              q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T08:57:44.1066842Z  * [new branch]              qchip/export-D54134695      -> origin/qchip/export-D54134695
2025-12-04T08:57:44.1068118Z  * [new branch]              quote-pytest_cache          -> origin/quote-pytest_cache
2025-12-04T08:57:44.1069699Z  * [new branch]              reland-accgrad-stream-warn  -> origin/reland-accgrad-stream-warn
2025-12-04T08:57:44.1071220Z  * [new branch]              release/1.10                -> origin/release/1.10
2025-12-04T08:57:44.1072463Z  * [new branch]              release/1.11                -> origin/release/1.11
2025-12-04T08:57:44.1073557Z  * [new branch]              release/1.12                -> origin/release/1.12
2025-12-04T08:57:44.1074647Z  * [new branch]              release/1.13                -> origin/release/1.13
2025-12-04T08:57:44.1075814Z  * [new branch]              release/1.4                 -> origin/release/1.4
2025-12-04T08:57:44.1076729Z  * [new branch]              release/1.4.1               -> origin/release/1.4.1
2025-12-04T08:57:44.1077825Z  * [new branch]              release/1.5                 -> origin/release/1.5
2025-12-04T08:57:44.1079007Z  * [new branch]              release/1.6                 -> origin/release/1.6
2025-12-04T08:57:44.1080142Z  * [new branch]              release/1.7                 -> origin/release/1.7
2025-12-04T08:57:44.1081401Z  * [new branch]              release/1.8                 -> origin/release/1.8
2025-12-04T08:57:44.1082482Z  * [new branch]              release/1.9                 -> origin/release/1.9
2025-12-04T08:57:44.1083600Z  * [new branch]              release/2.0                 -> origin/release/2.0
2025-12-04T08:57:44.1085220Z  * [new branch]              release/2.1                 -> origin/release/2.1
2025-12-04T08:57:44.1086469Z  * [new branch]              release/2.2                 -> origin/release/2.2
2025-12-04T08:57:44.1087885Z  * [new branch]              release/2.3                 -> origin/release/2.3
2025-12-04T08:57:44.1089405Z  * [new branch]              release/2.4                 -> origin/release/2.4
2025-12-04T08:57:44.1090946Z  * [new branch]              release/2.5                 -> origin/release/2.5
2025-12-04T08:57:44.1092249Z  * [new branch]              release/2.6                 -> origin/release/2.6
2025-12-04T08:57:44.1093472Z  * [new branch]              release/2.7                 -> origin/release/2.7
2025-12-04T08:57:44.1094635Z  * [new branch]              release/2.8                 -> origin/release/2.8
2025-12-04T08:57:44.1096142Z  * [new branch]              release/2.9                 -> origin/release/2.9
2025-12-04T08:57:44.1097784Z  * [new branch]              release_notes               -> origin/release_notes
2025-12-04T08:57:44.1099010Z  * [new branch]              remove_pyinterpreter        -> origin/remove_pyinterpreter
2025-12-04T08:57:44.1100527Z  * [new branch]              replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T08:57:44.1101490Z  * [new branch]              replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T08:57:44.1102514Z  * [new branch]              replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T08:57:44.1103694Z  * [new branch]              replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T08:57:44.1105948Z  * [new branch]              revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T08:57:44.1108132Z  * [new branch]              revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T08:57:44.1110372Z  * [new branch]              revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T08:57:44.1112946Z  * [new branch]              revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T08:57:44.1114516Z  * [new branch]              revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T08:57:44.1115337Z  * [new branch]              revert-hoo-invoke-subgraph  -> origin/revert-hoo-invoke-subgraph
2025-12-04T08:57:44.1116546Z  * [new branch]              revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T08:57:44.1117606Z  * [new branch]              rms_norm_patch              -> origin/rms_norm_patch
2025-12-04T08:57:44.1119595Z  * [new branch]              ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T08:57:44.1120735Z  * [new branch]              ruisi/fix_comm_estimation   -> origin/ruisi/fix_comm_estimation
2025-12-04T08:57:44.1122326Z  * [new branch]              ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T08:57:44.1123184Z  * [new branch]              ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T08:57:44.1124663Z  * [new branch]              ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T08:57:44.1126073Z  * [new branch]              ruisi/manual_bucket_pass    -> origin/ruisi/manual_bucket_pass
2025-12-04T08:57:44.1127908Z  * [new branch]              ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T08:57:44.1128764Z  * [new branch]              ryanguo99/fix-closure-var   -> origin/ryanguo99/fix-closure-var
2025-12-04T08:57:44.1130298Z  * [new branch]              rzou/faketensor_bench       -> origin/rzou/faketensor_bench
2025-12-04T08:57:44.1131328Z  * [new branch]              rzou/njt                    -> origin/rzou/njt
2025-12-04T08:57:44.1132482Z  * [new branch]              rzou/pca                    -> origin/rzou/pca
2025-12-04T08:57:44.1133656Z  * [new branch]              rzou/realprop               -> origin/rzou/realprop
2025-12-04T08:57:44.1134825Z  * [new branch]              samplevllm                  -> origin/samplevllm
2025-12-04T08:57:44.1136999Z  * [new branch]              sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T08:57:44.1138202Z  * [new branch]              sapling-pr-archive-SS-JIA   -> origin/sapling-pr-archive-SS-JIA
2025-12-04T08:57:44.1139552Z  * [new branch]              sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T08:57:44.1140611Z  * [new branch]              save                        -> origin/save
2025-12-04T08:57:44.1141889Z  * [new branch]              scaled_mm                   -> origin/scaled_mm
2025-12-04T08:57:44.1143084Z  * [new branch]              scan_attempt                -> origin/scan_attempt
2025-12-04T08:57:44.1144559Z  * [new branch]              sdym/2.5.1                  -> origin/sdym/2.5.1
2025-12-04T08:57:44.1145887Z  * [new branch]              sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T08:57:44.1147269Z  * [new branch]              shengf/fx-xform-perf        -> origin/shengf/fx-xform-perf
2025-12-04T08:57:44.1148525Z  * [new branch]              shoumikhin-patch-1          -> origin/shoumikhin-patch-1
2025-12-04T08:57:44.1149813Z  * [new branch]              solve-accuracy-fix          -> origin/solve-accuracy-fix
2025-12-04T08:57:44.1150963Z  * [new branch]              some_rocm_inductor_skips    -> origin/some_rocm_inductor_skips
2025-12-04T08:57:44.1152456Z  * [new branch]              soulitzer/stash-tls-ac      -> origin/soulitzer/stash-tls-ac
2025-12-04T08:57:44.1153661Z  * [new branch]              sparse-mm-bf16-support      -> origin/sparse-mm-bf16-support
2025-12-04T08:57:44.1154800Z  * [new branch]              starterTaskUpdate           -> origin/starterTaskUpdate
2025-12-04T08:57:44.1155959Z  * [new branch]              suo                         -> origin/suo
2025-12-04T08:57:44.1157075Z  * [new branch]              sve-poc                     -> origin/sve-poc
2025-12-04T08:57:44.1158402Z  * [new branch]              switch-bn                   -> origin/switch-bn
2025-12-04T08:57:44.1159602Z  * [new branch]              sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T08:57:44.1160678Z  * [new branch]              sy_aot_eager_record         -> origin/sy_aot_eager_record
2025-12-04T08:57:44.1162251Z  * [new branch]              sy_custom_bucketing         -> origin/sy_custom_bucketing
2025-12-04T08:57:44.1163406Z  * [new branch]              sy_debug_mode_test          -> origin/sy_debug_mode_test
2025-12-04T08:57:44.1164897Z  * [new branch]              sy_deserialize              -> origin/sy_deserialize
2025-12-04T08:57:44.1165933Z  * [new branch]              sy_dump_gm_code             -> origin/sy_dump_gm_code
2025-12-04T08:57:44.1167039Z  * [new branch]              sy_exp                      -> origin/sy_exp
2025-12-04T08:57:44.1168255Z  * [new branch]              sy_export_annotation        -> origin/sy_export_annotation
2025-12-04T08:57:44.1169441Z  * [new branch]              sy_invoke_subgraph          -> origin/sy_invoke_subgraph
2025-12-04T08:57:44.1170986Z  * [new branch]              sy_kernel_bw_name           -> origin/sy_kernel_bw_name
2025-12-04T08:57:44.1172148Z  * [new branch]              sy_multi_arch               -> origin/sy_multi_arch
2025-12-04T08:57:44.1173314Z  * [new branch]              sy_nn_module_stack          -> origin/sy_nn_module_stack
2025-12-04T08:57:44.1174465Z  * [new branch]              sy_original_dtensor         -> origin/sy_original_dtensor
2025-12-04T08:57:44.1175615Z  * [new branch]              sy_profiler_cia             -> origin/sy_profiler_cia
2025-12-04T08:57:44.1177033Z  * [new branch]              symm_mem_sync               -> origin/symm_mem_sync
2025-12-04T08:57:44.1178450Z  * [new branch]              sympy-bottleneck-repro      -> origin/sympy-bottleneck-repro
2025-12-04T08:57:44.1179717Z  * [new branch]              tensordict_integration      -> origin/tensordict_integration
2025-12-04T08:57:44.1181048Z  * [new branch]              test-move-conda-builds      -> origin/test-move-conda-builds
2025-12-04T08:57:44.1182159Z  * [new branch]              test-old                    -> origin/test-old
2025-12-04T08:57:44.1184096Z  * [new branch]              test/bmm_heur               -> origin/test/bmm_heur
2025-12-04T08:57:44.1185743Z  * [new branch]              tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T08:57:44.1186936Z  * [new branch]              tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T08:57:44.1187939Z  * [new branch]              tianren/customOp_fusion     -> origin/tianren/customOp_fusion
2025-12-04T08:57:44.1189266Z  * [new branch]              tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T08:57:44.1190593Z  * [new branch]              tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T08:57:44.1191996Z  * [new branch]              tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T08:57:44.1193051Z  * [new branch]              tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T08:57:44.1194189Z  * [new branch]              tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T08:57:44.1195275Z  * [new branch]              tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T08:57:44.1196418Z  * [new branch]              tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T08:57:44.1197492Z  * [new branch]              tianren/fx_codegen_dump     -> origin/tianren/fx_codegen_dump
2025-12-04T08:57:44.1198617Z  * [new branch]              tianren/symmetric_memory    -> origin/tianren/symmetric_memory
2025-12-04T08:57:44.1199717Z  * [new branch]              tianren/test                -> origin/tianren/test
2025-12-04T08:57:44.1200920Z  * [new branch]              tidy_performance_cyy        -> origin/tidy_performance_cyy
2025-12-04T08:57:44.1202039Z  * [new branch]              tmp                         -> origin/tmp
2025-12-04T08:57:44.1203236Z  * [new branch]              torchtitan_ep               -> origin/torchtitan_ep
2025-12-04T08:57:44.1204457Z  * [new branch]              torchtitan_integration      -> origin/torchtitan_integration
2025-12-04T08:57:44.1205616Z  * [new branch]              trace_fsdp_torchtune_lora   -> origin/trace_fsdp_torchtune_lora
2025-12-04T08:57:44.1206798Z  * [new branch]              traceable_fsdp_unit_tests   -> origin/traceable_fsdp_unit_tests
2025-12-04T08:57:44.1207893Z  * [new branch]              tree_loop_vec_base          -> origin/tree_loop_vec_base
2025-12-04T08:57:44.1209014Z  * [new branch]              triton_kernel               -> origin/triton_kernel
2025-12-04T08:57:44.1210997Z  * [new branch]              tt_pkg_1908                 -> origin/tt_pkg_1908
2025-12-04T08:57:44.1211444Z  * [new branch]              type_dec                    -> origin/type_dec
2025-12-04T08:57:44.1213289Z  * [new branch]              udate-sphinx-dependancies   -> origin/udate-sphinx-dependancies
2025-12-04T08:57:44.1214483Z  * [new branch]              update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T08:57:44.1215410Z  * [new branch]              update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T08:57:44.1216617Z  * [new branch]              update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T08:57:44.1218062Z  * [new branch]              update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T08:57:44.1219108Z  * [new branch]              update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T08:57:44.1220454Z  * [new branch]              update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T08:57:44.1222341Z  * [new branch]              update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T08:57:44.1224280Z  * [new branch]              update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T08:57:44.1225324Z  * [new branch]              update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T08:57:44.1226336Z  * [new branch]              update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T08:57:44.1227527Z  * [new branch]              update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T08:57:44.1228776Z  * [new branch]              update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T08:57:44.1230327Z  * [new branch]              update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T08:57:44.1231522Z  * [new branch]              update-vllm-dockerfile      -> origin/update-vllm-dockerfile
2025-12-04T08:57:44.1233273Z  * [new branch]              update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T08:57:44.1234287Z  * [new branch]              update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T08:57:44.1235354Z  * [new branch]              update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T08:57:44.1236609Z  * [new branch]              update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T08:57:44.1237656Z  * [new branch]              update_operator_readme      -> origin/update_operator_readme
2025-12-04T08:57:44.1238903Z  * [new branch]              update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T08:57:44.1240046Z  * [new branch]              update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T08:57:44.1241189Z  * [new branch]              update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T08:57:44.1242376Z  * [new branch]              update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T08:57:44.1243684Z  * [new branch]              update_submodule_FBGEMM     -> origin/update_submodule_FBGEMM
2025-12-04T08:57:44.1244667Z  * [new branch]              update_submodule_kineto     -> origin/update_submodule_kineto
2025-12-04T08:57:44.1245862Z  * [new branch]              update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T08:57:44.1247030Z  * [new branch]              upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T08:57:44.1248228Z  * [new branch]              v0.1.2                      -> origin/v0.1.2
2025-12-04T08:57:44.1249593Z  * [new branch]              v1.0.1                      -> origin/v1.0.1
2025-12-04T08:57:44.1250820Z  * [new branch]              v1.0.3                      -> origin/v1.0.3
2025-12-04T08:57:44.1252017Z  * [new branch]              v1.1.0                      -> origin/v1.1.0
2025-12-04T08:57:44.1253501Z  * [new branch]              v1.2.0                      -> origin/v1.2.0
2025-12-04T08:57:44.1254727Z  * [new branch]              v1.3.0                      -> origin/v1.3.0
2025-12-04T08:57:44.1255948Z  * [new branch]              v1.3.1                      -> origin/v1.3.1
2025-12-04T08:57:44.1257491Z  * [new branch]              validate_fn                 -> origin/validate_fn
2025-12-04T08:57:44.1258881Z  * [new branch]              validations_2.6             -> origin/validations_2.6
2025-12-04T08:57:44.1260184Z  * [new branch]              validations_2.8             -> origin/validations_2.8
2025-12-04T08:57:44.1261375Z  * [new branch]              varlen-api                  -> origin/varlen-api
2025-12-04T08:57:44.1263022Z  * [new branch]              varlen-api-backup           -> origin/varlen-api-backup
2025-12-04T08:57:44.1264221Z  * [new branch]              varlen_batch_invariance     -> origin/varlen_batch_invariance
2025-12-04T08:57:44.1265510Z  * [new branch]              viable/strict               -> origin/viable/strict
2025-12-04T08:57:44.1267288Z  * [new branch]              vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T08:57:44.1268413Z  * [new branch]              vllmbuildci                 -> origin/vllmbuildci
2025-12-04T08:57:44.1269736Z  * [new branch]              vllmpin                     -> origin/vllmpin
2025-12-04T08:57:44.1271078Z  * [new branch]              vscode-recommend-pyrefly    -> origin/vscode-recommend-pyrefly
2025-12-04T08:57:44.1272236Z  * [new branch]              wdvr-patch-1                -> origin/wdvr-patch-1
2025-12-04T08:57:44.1273749Z  * [new branch]              wdvr/iss_145259             -> origin/wdvr/iss_145259
2025-12-04T08:57:44.1275199Z  * [new branch]              whc/pei                     -> origin/whc/pei
2025-12-04T08:57:44.1276302Z  * [new branch]              whc/pp_fix                  -> origin/whc/pp_fix
2025-12-04T08:57:44.1277479Z  * [new branch]              whc/sharding                -> origin/whc/sharding
2025-12-04T08:57:44.1278501Z  * [new branch]              whc/sharding2               -> origin/whc/sharding2
2025-12-04T08:57:44.1279521Z  * [new branch]              whc/uneven                  -> origin/whc/uneven
2025-12-04T08:57:44.1280943Z  * [new branch]              whc/uneven-merge            -> origin/whc/uneven-merge
2025-12-04T08:57:44.1282132Z  * [new branch]              win_warnings                -> origin/win_warnings
2025-12-04T08:57:44.1283289Z  * [new branch]              windows_libtorch_free       -> origin/windows_libtorch_free
2025-12-04T08:57:44.1284463Z  * [new branch]              xmfan-war                   -> origin/xmfan-war
2025-12-04T08:57:44.1286001Z  * [new branch]              xmfan/ca_0516               -> origin/xmfan/ca_0516
2025-12-04T08:57:44.1287078Z  * [new branch]              xmfan/ca_1051b93192         -> origin/xmfan/ca_1051b93192
2025-12-04T08:57:44.1288290Z  * [new branch]              xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T08:57:44.1288966Z  * [new branch]              xmfan/ca_5a2be192d1         -> origin/xmfan/ca_5a2be192d1
2025-12-04T08:57:44.1290109Z  * [new branch]              xmfan/ca_9d59b516e9         -> origin/xmfan/ca_9d59b516e9
2025-12-04T08:57:44.1290957Z  * [new branch]              xmfan/ca_apr8               -> origin/xmfan/ca_apr8
2025-12-04T08:57:44.1292193Z  * [new branch]              xmfan/ca_base               -> origin/xmfan/ca_base
2025-12-04T08:57:44.1293511Z  * [new branch]              xmfan/ca_dynamic            -> origin/xmfan/ca_dynamic
2025-12-04T08:57:44.1294874Z  * [new branch]              xmfan/ca_fix_dyn            -> origin/xmfan/ca_fix_dyn
2025-12-04T08:57:44.1295999Z  * [new branch]              xmfan/ca_fix_lowering       -> origin/xmfan/ca_fix_lowering
2025-12-04T08:57:44.1297427Z  * [new branch]              xmfan/ca_fix_polyfills      -> origin/xmfan/ca_fix_polyfills
2025-12-04T08:57:44.1298413Z  * [new branch]              xmfan/ca_jan3               -> origin/xmfan/ca_jan3
2025-12-04T08:57:44.1299490Z  * [new branch]              xmfan/ca_jun18              -> origin/xmfan/ca_jun18
2025-12-04T08:57:44.1300586Z  * [new branch]              xmfan/ca_jun24              -> origin/xmfan/ca_jun24
2025-12-04T08:57:44.1301751Z  * [new branch]              xmfan/ca_nested             -> origin/xmfan/ca_nested
2025-12-04T08:57:44.1302951Z  * [new branch]              xmfan/ca_overhead           -> origin/xmfan/ca_overhead
2025-12-04T08:57:44.1304119Z  * [new branch]              xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T08:57:44.1305115Z  * [new branch]              xmfan/cacu_jun18            -> origin/xmfan/cacu_jun18
2025-12-04T08:57:44.1306699Z  * [new branch]              xmfan/cacu_jun19            -> origin/xmfan/cacu_jun19
2025-12-04T08:57:44.1307792Z  * [new branch]              xmfan/cacu_jun4             -> origin/xmfan/cacu_jun4
2025-12-04T08:57:44.1309030Z  * [new branch]              xmfan/disable_duck_shape    -> origin/xmfan/disable_duck_shape
2025-12-04T08:57:44.1310203Z  * [new branch]              xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T08:57:44.1311494Z  * [new branch]              xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T08:57:44.1312550Z  * [new branch]              xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T08:57:44.1313327Z  * [new branch]              xmfan/single_step           -> origin/xmfan/single_step
2025-12-04T08:57:44.1314435Z  * [new branch]              xmfan/sth_0829              -> origin/xmfan/sth_0829
2025-12-04T08:57:44.1315597Z  * [new branch]              xmfan/test                  -> origin/xmfan/test
2025-12-04T08:57:44.1317150Z  * [new branch]              yguo/debug-0226-constexpr   -> origin/yguo/debug-0226-constexpr
2025-12-04T08:57:44.1318138Z  * [new branch]              yguo/new_latest_changes     -> origin/yguo/new_latest_changes
2025-12-04T08:57:44.1319233Z  * [new branch]              yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T08:57:44.1320732Z  * [new branch]              yiming/bootcamp             -> origin/yiming/bootcamp
2025-12-04T08:57:44.1322405Z  * [new branch]              yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T08:57:44.1323569Z  * [new branch]              yolo-llama3                 -> origin/yolo-llama3
2025-12-04T08:57:44.1325098Z  * [new branch]              zainr/canary-test           -> origin/zainr/canary-test
2025-12-04T08:57:44.1326360Z  * [new branch]              zainr/cleanup-gh-runners    -> origin/zainr/cleanup-gh-runners
2025-12-04T08:57:44.1327354Z  * [new branch]              zainr/pull-migration-c      -> origin/zainr/pull-migration-c
2025-12-04T08:57:44.1328340Z  * [new branch]              zainr/test2                 -> origin/zainr/test2
2025-12-04T08:57:44.1329744Z  * [new branch]              zasdfgbnm-patch-3           -> origin/zasdfgbnm-patch-3
2025-12-04T08:57:44.1331004Z  * [new branch]              zb2p                        -> origin/zb2p
2025-12-04T08:57:44.1332148Z  * [new branch]              zeros-and-scatter-part2     -> origin/zeros-and-scatter-part2
2025-12-04T08:57:44.1334010Z  * [new branch]              zhxchen17/ci/vllm_lora_oom  -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T08:57:44.1335129Z  * [new branch]              zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T08:57:44.1336211Z  * [new branch]              zhxchen17/ci/vllm_pin       -> origin/zhxchen17/ci/vllm_pin
2025-12-04T08:57:44.1338172Z  * [new branch]              zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T08:57:44.1339570Z  * [new branch]              zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T08:57:44.1340653Z  * [new branch]              zhxchen17/export/codemod1   -> origin/zhxchen17/export/codemod1
2025-12-04T08:57:44.1341871Z  * [new branch]              zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T08:57:44.1343205Z  * [new branch]              zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T08:57:44.1344151Z  * [new branch]              zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T08:57:44.1346144Z  * [new branch]              zhxchen17/precompile/aoti   -> origin/zhxchen17/precompile/aoti
2025-12-04T08:57:44.1347377Z  * [new branch]              zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T08:57:44.1348682Z  * [new branch]              zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T08:57:44.1349914Z  * [new branch]              zhxchen17/scratch/0         -> origin/zhxchen17/scratch/0
2025-12-04T08:57:44.1351104Z  * [new branch]              zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T08:57:44.1352700Z  * [new branch]              zhxhcen17/moodycamel        -> origin/zhxhcen17/moodycamel
2025-12-04T08:57:44.1354234Z  * [new branch]              zxiiro/build-times          -> origin/zxiiro/build-times
2025-12-04T08:57:44.1355341Z  * [new branch]              zxiiro/c7i.2xlarge          -> origin/zxiiro/c7i.2xlarge
2025-12-04T08:57:44.1356603Z  * [new branch]              zxiiro/c7i.2xlarge.h100     -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T08:57:44.1357710Z  * [new branch]              zxiiro/main                 -> origin/zxiiro/main
2025-12-04T08:57:44.1359278Z  * [new branch]              zxiiro/risc64               -> origin/zxiiro/risc64
2025-12-04T08:57:44.1360500Z  * [new branch]              zxiiro/test-multicloud-arc  -> origin/zxiiro/test-multicloud-arc
2025-12-04T08:57:44.1361641Z  * [new tag]                 bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug
2025-12-04T08:57:44.1362312Z  * [new tag]                 ci/binaries/77164           -> ci/binaries/77164
2025-12-04T08:57:44.1363292Z  * [new tag]                 ciflow/b200/115316          -> ciflow/b200/115316
2025-12-04T08:57:44.1363940Z  * [new tag]                 ciflow/b200/160685          -> ciflow/b200/160685
2025-12-04T08:57:44.1364680Z  * [new tag]                 ciflow/b200/161607          -> ciflow/b200/161607
2025-12-04T08:57:44.1365326Z  * [new tag]                 ciflow/b200/161938          -> ciflow/b200/161938
2025-12-04T08:57:44.1366205Z  * [new tag]                 ciflow/b200/167207          -> ciflow/b200/167207
2025-12-04T08:57:44.1366979Z  * [new tag]                 ciflow/b200/167989          -> ciflow/b200/167989
2025-12-04T08:57:44.1367673Z  * [new tag]                 ciflow/b200/168096          -> ciflow/b200/168096
2025-12-04T08:57:44.1368467Z  * [new tag]                 ciflow/b200/168175          -> ciflow/b200/168175
2025-12-04T08:57:44.1369150Z  * [new tag]                 ciflow/b200/168195          -> ciflow/b200/168195
2025-12-04T08:57:44.1370260Z  * [new tag]                 ciflow/b200/169200          -> ciflow/b200/169200
2025-12-04T08:57:44.1370878Z  * [new tag]                 ciflow/b200/169216          -> ciflow/b200/169216
2025-12-04T08:57:44.1372052Z  * [new tag]                 ciflow/b200/169380          -> ciflow/b200/169380
2025-12-04T08:57:44.1373175Z  * [new tag]                 ciflow/b200/169412          -> ciflow/b200/169412
2025-12-04T08:57:44.1374070Z  * [new tag]                 ciflow/b200/169470          -> ciflow/b200/169470
2025-12-04T08:57:44.1375274Z  * [new tag]                 ciflow/b200/169471          -> ciflow/b200/169471
2025-12-04T08:57:44.1375912Z  * [new tag]                 ciflow/b200/169472          -> ciflow/b200/169472
2025-12-04T08:57:44.1377266Z  * [new tag]                 ciflow/b200/169514          -> ciflow/b200/169514
2025-12-04T08:57:44.1377993Z  * [new tag]                 ciflow/b200/169517          -> ciflow/b200/169517
2025-12-04T08:57:44.1379139Z  * [new tag]                 ciflow/binaries/165922      -> ciflow/binaries/165922
2025-12-04T08:57:44.1379807Z  * [new tag]                 ciflow/binaries/169510      -> ciflow/binaries/169510
2025-12-04T08:57:44.1380837Z  * [new tag]                 ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994
2025-12-04T08:57:44.1381618Z  * [new tag]                 ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829
2025-12-04T08:57:44.1382339Z  * [new tag]                 ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972
2025-12-04T08:57:44.1383139Z  * [new tag]                 ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981
2025-12-04T08:57:44.1383948Z  * [new tag]                 ciflow/dynamo/167695        -> ciflow/dynamo/167695
2025-12-04T08:57:44.1384723Z  * [new tag]                 ciflow/dynamo/168096        -> ciflow/dynamo/168096
2025-12-04T08:57:44.1385561Z  * [new tag]                 ciflow/dynamo/169525        -> ciflow/dynamo/169525
2025-12-04T08:57:44.1386617Z  * [new tag]                 ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938
2025-12-04T08:57:44.1387239Z  * [new tag]                 ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940
2025-12-04T08:57:44.1388316Z  * [new tag]                 ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923
2025-12-04T08:57:44.1389168Z  * [new tag]                 ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552
2025-12-04T08:57:44.1389858Z  * [new tag]                 ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129
2025-12-04T08:57:44.1390529Z  * [new tag]                 ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917
2025-12-04T08:57:44.1391616Z  * [new tag]                 ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156
2025-12-04T08:57:44.1392289Z  * [new tag]                 ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200
2025-12-04T08:57:44.1393084Z  * [new tag]                 ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216
2025-12-04T08:57:44.1393750Z  * [new tag]                 ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338
2025-12-04T08:57:44.1394548Z  * [new tag]                 ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355
2025-12-04T08:57:44.1395239Z  * [new tag]                 ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543
2025-12-04T08:57:44.1396012Z  * [new tag]                 ciflow/h100/115316          -> ciflow/h100/115316
2025-12-04T08:57:44.1396685Z  * [new tag]                 ciflow/h100/160685          -> ciflow/h100/160685
2025-12-04T08:57:44.1397358Z  * [new tag]                 ciflow/h100/160729          -> ciflow/h100/160729
2025-12-04T08:57:44.1398036Z  * [new tag]                 ciflow/h100/161607          -> ciflow/h100/161607
2025-12-04T08:57:44.1398686Z  * [new tag]                 ciflow/h100/161938          -> ciflow/h100/161938
2025-12-04T08:57:44.1399375Z  * [new tag]                 ciflow/h100/167207          -> ciflow/h100/167207
2025-12-04T08:57:44.1400139Z  * [new tag]                 ciflow/h100/167989          -> ciflow/h100/167989
2025-12-04T08:57:44.1400740Z  * [new tag]                 ciflow/h100/168096          -> ciflow/h100/168096
2025-12-04T08:57:44.1401433Z  * [new tag]                 ciflow/h100/168175          -> ciflow/h100/168175
2025-12-04T08:57:44.1402103Z  * [new tag]                 ciflow/h100/168195          -> ciflow/h100/168195
2025-12-04T08:57:44.1402814Z  * [new tag]                 ciflow/h100/168980          -> ciflow/h100/168980
2025-12-04T08:57:44.1403776Z  * [new tag]                 ciflow/h100/169200          -> ciflow/h100/169200
2025-12-04T08:57:44.1404720Z  * [new tag]                 ciflow/h100/169216          -> ciflow/h100/169216
2025-12-04T08:57:44.1405605Z  * [new tag]                 ciflow/h100/169380          -> ciflow/h100/169380
2025-12-04T08:57:44.1406285Z  * [new tag]                 ciflow/h100/169412          -> ciflow/h100/169412
2025-12-04T08:57:44.1407013Z  * [new tag]                 ciflow/h100/169470          -> ciflow/h100/169470
2025-12-04T08:57:44.1407740Z  * [new tag]                 ciflow/h100/169471          -> ciflow/h100/169471
2025-12-04T08:57:44.1408459Z  * [new tag]                 ciflow/h100/169472          -> ciflow/h100/169472
2025-12-04T08:57:44.1409172Z  * [new tag]                 ciflow/h100/169514          -> ciflow/h100/169514
2025-12-04T08:57:44.1410111Z  * [new tag]                 ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096
2025-12-04T08:57:44.1411416Z  * [new tag]                 ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096
2025-12-04T08:57:44.1412131Z  * [new tag]                 ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165
2025-12-04T08:57:44.1412826Z  * [new tag]                 ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096
2025-12-04T08:57:44.1413765Z  * [new tag]                 ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096
2025-12-04T08:57:44.1415147Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073
2025-12-04T08:57:44.1415774Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096
2025-12-04T08:57:44.1416842Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024
2025-12-04T08:57:44.1417888Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024
2025-12-04T08:57:44.1418549Z  * [new tag]                 ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096
2025-12-04T08:57:44.1419318Z  * [new tag]                 ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096
2025-12-04T08:57:44.1420056Z  * [new tag]                 ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024
2025-12-04T08:57:44.1421078Z  * [new tag]                 ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425
2025-12-04T08:57:44.1425308Z  * [new tag]                 ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545
2025-12-04T08:57:44.1426107Z  * [new tag]                 ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997
2025-12-04T08:57:44.1426835Z  * [new tag]                 ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096
2025-12-04T08:57:44.1427659Z  * [new tag]                 ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063
2025-12-04T08:57:44.1428384Z  * [new tag]                 ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425
2025-12-04T08:57:44.1429244Z  * [new tag]                 ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545
2025-12-04T08:57:44.1430105Z  * [new tag]                 ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096
2025-12-04T08:57:44.1430662Z  * [new tag]                 ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063
2025-12-04T08:57:44.1431370Z  * [new tag]                 ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425
2025-12-04T08:57:44.1432942Z  * [new tag]                 ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052
2025-12-04T08:57:44.1433587Z  * [new tag]                 ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971
2025-12-04T08:57:44.1434477Z  * [new tag]                 ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096
2025-12-04T08:57:44.1435246Z  * [new tag]                 ciflow/inductor/144542      -> ciflow/inductor/144542
2025-12-04T08:57:44.1435948Z  * [new tag]                 ciflow/inductor/146506      -> ciflow/inductor/146506
2025-12-04T08:57:44.1436770Z  * [new tag]                 ciflow/inductor/147990      -> ciflow/inductor/147990
2025-12-04T08:57:44.1437604Z  * [new tag]                 ciflow/inductor/148294      -> ciflow/inductor/148294
2025-12-04T08:57:44.1438332Z  * [new tag]                 ciflow/inductor/148492      -> ciflow/inductor/148492
2025-12-04T08:57:44.1439017Z  * [new tag]                 ciflow/inductor/157149      -> ciflow/inductor/157149
2025-12-04T08:57:44.1439720Z  * [new tag]                 ciflow/inductor/157994      -> ciflow/inductor/157994
2025-12-04T08:57:44.1440529Z  * [new tag]                 ciflow/inductor/160685      -> ciflow/inductor/160685
2025-12-04T08:57:44.1441245Z  * [new tag]                 ciflow/inductor/160686      -> ciflow/inductor/160686
2025-12-04T08:57:44.1441922Z  * [new tag]                 ciflow/inductor/160687      -> ciflow/inductor/160687
2025-12-04T08:57:44.1442607Z  * [new tag]                 ciflow/inductor/160688      -> ciflow/inductor/160688
2025-12-04T08:57:44.1443625Z  * [new tag]                 ciflow/inductor/160706      -> ciflow/inductor/160706
2025-12-04T08:57:44.1444587Z  * [new tag]                 ciflow/inductor/160729      -> ciflow/inductor/160729
2025-12-04T08:57:44.1445541Z  * [new tag]                 ciflow/inductor/161938      -> ciflow/inductor/161938
2025-12-04T08:57:44.1446190Z  * [new tag]                 ciflow/inductor/161939      -> ciflow/inductor/161939
2025-12-04T08:57:44.1446914Z  * [new tag]                 ciflow/inductor/161940      -> ciflow/inductor/161940
2025-12-04T08:57:44.1447638Z  * [new tag]                 ciflow/inductor/162052      -> ciflow/inductor/162052
2025-12-04T08:57:44.1448385Z  * [new tag]                 ciflow/inductor/162275      -> ciflow/inductor/162275
2025-12-04T08:57:44.1449079Z  * [new tag]                 ciflow/inductor/162795      -> ciflow/inductor/162795
2025-12-04T08:57:44.1450041Z  * [new tag]                 ciflow/inductor/163245      -> ciflow/inductor/163245
2025-12-04T08:57:44.1450722Z  * [new tag]                 ciflow/inductor/163335      -> ciflow/inductor/163335
2025-12-04T08:57:44.1451474Z  * [new tag]                 ciflow/inductor/163503      -> ciflow/inductor/163503
2025-12-04T08:57:44.1452183Z  * [new tag]                 ciflow/inductor/163942      -> ciflow/inductor/163942
2025-12-04T08:57:44.1453083Z  * [new tag]                 ciflow/inductor/165270      -> ciflow/inductor/165270
2025-12-04T08:57:44.1453770Z  * [new tag]                 ciflow/inductor/165274      -> ciflow/inductor/165274
2025-12-04T08:57:44.1454489Z  * [new tag]                 ciflow/inductor/165322      -> ciflow/inductor/165322
2025-12-04T08:57:44.1455221Z  * [new tag]                 ciflow/inductor/165597      -> ciflow/inductor/165597
2025-12-04T08:57:44.1455956Z  * [new tag]                 ciflow/inductor/166063      -> ciflow/inductor/166063
2025-12-04T08:57:44.1456956Z  * [new tag]                 ciflow/inductor/166075      -> ciflow/inductor/166075
2025-12-04T08:57:44.1457731Z  * [new tag]                 ciflow/inductor/166165      -> ciflow/inductor/166165
2025-12-04T08:57:44.1458847Z  * [new tag]                 ciflow/inductor/166254      -> ciflow/inductor/166254
2025-12-04T08:57:44.1459462Z  * [new tag]                 ciflow/inductor/166483      -> ciflow/inductor/166483
2025-12-04T08:57:44.1460216Z  * [new tag]                 ciflow/inductor/166494      -> ciflow/inductor/166494
2025-12-04T08:57:44.1460946Z  * [new tag]                 ciflow/inductor/166545      -> ciflow/inductor/166545
2025-12-04T08:57:44.1461802Z  * [new tag]                 ciflow/inductor/166788      -> ciflow/inductor/166788
2025-12-04T08:57:44.1463197Z  * [new tag]                 ciflow/inductor/166846      -> ciflow/inductor/166846
2025-12-04T08:57:44.1463894Z  * [new tag]                 ciflow/inductor/167300      -> ciflow/inductor/167300
2025-12-04T08:57:44.1464645Z  * [new tag]                 ciflow/inductor/167407      -> ciflow/inductor/167407
2025-12-04T08:57:44.1465574Z  * [new tag]                 ciflow/inductor/167536      -> ciflow/inductor/167536
2025-12-04T08:57:44.1466275Z  * [new tag]                 ciflow/inductor/167552      -> ciflow/inductor/167552
2025-12-04T08:57:44.1467022Z  * [new tag]                 ciflow/inductor/167555      -> ciflow/inductor/167555
2025-12-04T08:57:44.1467969Z  * [new tag]                 ciflow/inductor/167583      -> ciflow/inductor/167583
2025-12-04T08:57:44.1468649Z  * [new tag]                 ciflow/inductor/167599      -> ciflow/inductor/167599
2025-12-04T08:57:44.1469498Z  * [new tag]                 ciflow/inductor/167647      -> ciflow/inductor/167647
2025-12-04T08:57:44.1470232Z  * [new tag]                 ciflow/inductor/167677      -> ciflow/inductor/167677
2025-12-04T08:57:44.1470956Z  * [new tag]                 ciflow/inductor/167680      -> ciflow/inductor/167680
2025-12-04T08:57:44.1471695Z  * [new tag]                 ciflow/inductor/167695      -> ciflow/inductor/167695
2025-12-04T08:57:44.1472431Z  * [new tag]                 ciflow/inductor/167742      -> ciflow/inductor/167742
2025-12-04T08:57:44.1473173Z  * [new tag]                 ciflow/inductor/167768      -> ciflow/inductor/167768
2025-12-04T08:57:44.1474201Z  * [new tag]                 ciflow/inductor/167773      -> ciflow/inductor/167773
2025-12-04T08:57:44.1474867Z  * [new tag]                 ciflow/inductor/167781      -> ciflow/inductor/167781
2025-12-04T08:57:44.1475638Z  * [new tag]                 ciflow/inductor/167880      -> ciflow/inductor/167880
2025-12-04T08:57:44.1476352Z  * [new tag]                 ciflow/inductor/167887      -> ciflow/inductor/167887
2025-12-04T08:57:44.1477079Z  * [new tag]                 ciflow/inductor/167972      -> ciflow/inductor/167972
2025-12-04T08:57:44.1477794Z  * [new tag]                 ciflow/inductor/167989      -> ciflow/inductor/167989
2025-12-04T08:57:44.1478565Z  * [new tag]                 ciflow/inductor/168002      -> ciflow/inductor/168002
2025-12-04T08:57:44.1479245Z  * [new tag]                 ciflow/inductor/168050      -> ciflow/inductor/168050
2025-12-04T08:57:44.1479976Z  * [new tag]                 ciflow/inductor/168051      -> ciflow/inductor/168051
2025-12-04T08:57:44.1480707Z  * [new tag]                 ciflow/inductor/168052      -> ciflow/inductor/168052
2025-12-04T08:57:44.1481420Z  * [new tag]                 ciflow/inductor/168073      -> ciflow/inductor/168073
2025-12-04T08:57:44.1482171Z  * [new tag]                 ciflow/inductor/168096      -> ciflow/inductor/168096
2025-12-04T08:57:44.1482867Z  * [new tag]                 ciflow/inductor/168114      -> ciflow/inductor/168114
2025-12-04T08:57:44.1483587Z  * [new tag]                 ciflow/inductor/168115      -> ciflow/inductor/168115
2025-12-04T08:57:44.1484303Z  * [new tag]                 ciflow/inductor/168127      -> ciflow/inductor/168127
2025-12-04T08:57:44.1485111Z  * [new tag]                 ciflow/inductor/168129      -> ciflow/inductor/168129
2025-12-04T08:57:44.1485827Z  * [new tag]                 ciflow/inductor/168157      -> ciflow/inductor/168157
2025-12-04T08:57:44.1486640Z  * [new tag]                 ciflow/inductor/168175      -> ciflow/inductor/168175
2025-12-04T08:57:44.1487449Z  * [new tag]                 ciflow/inductor/168185      -> ciflow/inductor/168185
2025-12-04T08:57:44.1488117Z  * [new tag]                 ciflow/inductor/168195      -> ciflow/inductor/168195
2025-12-04T08:57:44.1488826Z  * [new tag]                 ciflow/inductor/168209      -> ciflow/inductor/168209
2025-12-04T08:57:44.1489553Z  * [new tag]                 ciflow/inductor/168266      -> ciflow/inductor/168266
2025-12-04T08:57:44.1490276Z  * [new tag]                 ciflow/inductor/168316      -> ciflow/inductor/168316
2025-12-04T08:57:44.1491218Z  * [new tag]                 ciflow/inductor/168326      -> ciflow/inductor/168326
2025-12-04T08:57:44.1491853Z  * [new tag]                 ciflow/inductor/168368      -> ciflow/inductor/168368
2025-12-04T08:57:44.1492592Z  * [new tag]                 ciflow/inductor/168894      -> ciflow/inductor/168894
2025-12-04T08:57:44.1493309Z  * [new tag]                 ciflow/inductor/168934      -> ciflow/inductor/168934
2025-12-04T08:57:44.1494065Z  * [new tag]                 ciflow/inductor/168939      -> ciflow/inductor/168939
2025-12-04T08:57:44.1494758Z  * [new tag]                 ciflow/inductor/168946      -> ciflow/inductor/168946
2025-12-04T08:57:44.1495487Z  * [new tag]                 ciflow/inductor/168950      -> ciflow/inductor/168950
2025-12-04T08:57:44.1496218Z  * [new tag]                 ciflow/inductor/168951      -> ciflow/inductor/168951
2025-12-04T08:57:44.1497283Z  * [new tag]                 ciflow/inductor/168952      -> ciflow/inductor/168952
2025-12-04T08:57:44.1498020Z  * [new tag]                 ciflow/inductor/168955      -> ciflow/inductor/168955
2025-12-04T08:57:44.1498785Z  * [new tag]                 ciflow/inductor/168971      -> ciflow/inductor/168971
2025-12-04T08:57:44.1499536Z  * [new tag]                 ciflow/inductor/168979      -> ciflow/inductor/168979
2025-12-04T08:57:44.1500333Z  * [new tag]                 ciflow/inductor/168980      -> ciflow/inductor/168980
2025-12-04T08:57:44.1501273Z  * [new tag]                 ciflow/inductor/168983      -> ciflow/inductor/168983
2025-12-04T08:57:44.1501978Z  * [new tag]                 ciflow/inductor/169006      -> ciflow/inductor/169006
2025-12-04T08:57:44.1502714Z  * [new tag]                 ciflow/inductor/169023      -> ciflow/inductor/169023
2025-12-04T08:57:44.1503462Z  * [new tag]                 ciflow/inductor/169024      -> ciflow/inductor/169024
2025-12-04T08:57:44.1504240Z  * [new tag]                 ciflow/inductor/169025      -> ciflow/inductor/169025
2025-12-04T08:57:44.1504988Z  * [new tag]                 ciflow/inductor/169066      -> ciflow/inductor/169066
2025-12-04T08:57:44.1505735Z  * [new tag]                 ciflow/inductor/169091      -> ciflow/inductor/169091
2025-12-04T08:57:44.1506479Z  * [new tag]                 ciflow/inductor/169102      -> ciflow/inductor/169102
2025-12-04T08:57:44.1507239Z  * [new tag]                 ciflow/inductor/169103      -> ciflow/inductor/169103
2025-12-04T08:57:44.1507977Z  * [new tag]                 ciflow/inductor/169121      -> ciflow/inductor/169121
2025-12-04T08:57:44.1508720Z  * [new tag]                 ciflow/inductor/169134      -> ciflow/inductor/169134
2025-12-04T08:57:44.1509560Z  * [new tag]                 ciflow/inductor/169135      -> ciflow/inductor/169135
2025-12-04T08:57:44.1510298Z  * [new tag]                 ciflow/inductor/169141      -> ciflow/inductor/169141
2025-12-04T08:57:44.1511227Z  * [new tag]                 ciflow/inductor/169151      -> ciflow/inductor/169151
2025-12-04T08:57:44.1512474Z  * [new tag]                 ciflow/inductor/169161      -> ciflow/inductor/169161
2025-12-04T08:57:44.1513165Z  * [new tag]                 ciflow/inductor/169167      -> ciflow/inductor/169167
2025-12-04T08:57:44.1514109Z  * [new tag]                 ciflow/inductor/169177      -> ciflow/inductor/169177
2025-12-04T08:57:44.1514899Z  * [new tag]                 ciflow/inductor/169185      -> ciflow/inductor/169185
2025-12-04T08:57:44.1515651Z  * [new tag]                 ciflow/inductor/169196      -> ciflow/inductor/169196
2025-12-04T08:57:44.1516447Z  * [new tag]                 ciflow/inductor/169200      -> ciflow/inductor/169200
2025-12-04T08:57:44.1517108Z  * [new tag]                 ciflow/inductor/169204      -> ciflow/inductor/169204
2025-12-04T08:57:44.1517813Z  * [new tag]                 ciflow/inductor/169216      -> ciflow/inductor/169216
2025-12-04T08:57:44.1518743Z  * [new tag]                 ciflow/inductor/169219      -> ciflow/inductor/169219
2025-12-04T08:57:44.1519398Z  * [new tag]                 ciflow/inductor/169220      -> ciflow/inductor/169220
2025-12-04T08:57:44.1520302Z  * [new tag]                 ciflow/inductor/169230      -> ciflow/inductor/169230
2025-12-04T08:57:44.1521135Z  * [new tag]                 ciflow/inductor/169242      -> ciflow/inductor/169242
2025-12-04T08:57:44.1522320Z  * [new tag]                 ciflow/inductor/169245      -> ciflow/inductor/169245
2025-12-04T08:57:44.1523037Z  * [new tag]                 ciflow/inductor/169260      -> ciflow/inductor/169260
2025-12-04T08:57:44.1523801Z  * [new tag]                 ciflow/inductor/169282      -> ciflow/inductor/169282
2025-12-04T08:57:44.1524528Z  * [new tag]                 ciflow/inductor/169286      -> ciflow/inductor/169286
2025-12-04T08:57:44.1525286Z  * [new tag]                 ciflow/inductor/169299      -> ciflow/inductor/169299
2025-12-04T08:57:44.1526232Z  * [new tag]                 ciflow/inductor/169304      -> ciflow/inductor/169304
2025-12-04T08:57:44.1527463Z  * [new tag]                 ciflow/inductor/169305      -> ciflow/inductor/169305
2025-12-04T08:57:44.1528131Z  * [new tag]                 ciflow/inductor/169308      -> ciflow/inductor/169308
2025-12-04T08:57:44.1528904Z  * [new tag]                 ciflow/inductor/169319      -> ciflow/inductor/169319
2025-12-04T08:57:44.1529639Z  * [new tag]                 ciflow/inductor/169326      -> ciflow/inductor/169326
2025-12-04T08:57:44.1530396Z  * [new tag]                 ciflow/inductor/169332      -> ciflow/inductor/169332
2025-12-04T08:57:44.1531147Z  * [new tag]                 ciflow/inductor/169333      -> ciflow/inductor/169333
2025-12-04T08:57:44.1532168Z  * [new tag]                 ciflow/inductor/169336      -> ciflow/inductor/169336
2025-12-04T08:57:44.1532886Z  * [new tag]                 ciflow/inductor/169340      -> ciflow/inductor/169340
2025-12-04T08:57:44.1533790Z  * [new tag]                 ciflow/inductor/169341      -> ciflow/inductor/169341
2025-12-04T08:57:44.1534604Z  * [new tag]                 ciflow/inductor/169343      -> ciflow/inductor/169343
2025-12-04T08:57:44.1535364Z  * [new tag]                 ciflow/inductor/169346      -> ciflow/inductor/169346
2025-12-04T08:57:44.1536265Z  * [new tag]                 ciflow/inductor/169348      -> ciflow/inductor/169348
2025-12-04T08:57:44.1537441Z  * [new tag]                 ciflow/inductor/169350      -> ciflow/inductor/169350
2025-12-04T08:57:44.1538228Z  * [new tag]                 ciflow/inductor/169355      -> ciflow/inductor/169355
2025-12-04T08:57:44.1539034Z  * [new tag]                 ciflow/inductor/169370      -> ciflow/inductor/169370
2025-12-04T08:57:44.1540139Z  * [new tag]                 ciflow/inductor/169375      -> ciflow/inductor/169375
2025-12-04T08:57:44.1540853Z  * [new tag]                 ciflow/inductor/169389      -> ciflow/inductor/169389
2025-12-04T08:57:44.1541598Z  * [new tag]                 ciflow/inductor/169391      -> ciflow/inductor/169391
2025-12-04T08:57:44.1542368Z  * [new tag]                 ciflow/inductor/169393      -> ciflow/inductor/169393
2025-12-04T08:57:44.1543091Z  * [new tag]                 ciflow/inductor/169399      -> ciflow/inductor/169399
2025-12-04T08:57:44.1544032Z  * [new tag]                 ciflow/inductor/169400      -> ciflow/inductor/169400
2025-12-04T08:57:44.1544751Z  * [new tag]                 ciflow/inductor/169415      -> ciflow/inductor/169415
2025-12-04T08:57:44.1545537Z  * [new tag]                 ciflow/inductor/169417      -> ciflow/inductor/169417
2025-12-04T08:57:44.1546427Z  * [new tag]                 ciflow/inductor/169418      -> ciflow/inductor/169418
2025-12-04T08:57:44.1547365Z  * [new tag]                 ciflow/inductor/169430      -> ciflow/inductor/169430
2025-12-04T08:57:44.1548039Z  * [new tag]                 ciflow/inductor/169432      -> ciflow/inductor/169432
2025-12-04T08:57:44.1548819Z  * [new tag]                 ciflow/inductor/169436      -> ciflow/inductor/169436
2025-12-04T08:57:44.1549832Z  * [new tag]                 ciflow/inductor/169437      -> ciflow/inductor/169437
2025-12-04T08:57:44.1550505Z  * [new tag]                 ciflow/inductor/169438      -> ciflow/inductor/169438
2025-12-04T08:57:44.1551222Z  * [new tag]                 ciflow/inductor/169441      -> ciflow/inductor/169441
2025-12-04T08:57:44.1551977Z  * [new tag]                 ciflow/inductor/169446      -> ciflow/inductor/169446
2025-12-04T08:57:44.1552900Z  * [new tag]                 ciflow/inductor/169447      -> ciflow/inductor/169447
2025-12-04T08:57:44.1553580Z  * [new tag]                 ciflow/inductor/169452      -> ciflow/inductor/169452
2025-12-04T08:57:44.1554492Z  * [new tag]                 ciflow/inductor/169455      -> ciflow/inductor/169455
2025-12-04T08:57:44.1555167Z  * [new tag]                 ciflow/inductor/169459      -> ciflow/inductor/169459
2025-12-04T08:57:44.1556112Z  * [new tag]                 ciflow/inductor/169463      -> ciflow/inductor/169463
2025-12-04T08:57:44.1557022Z  * [new tag]                 ciflow/inductor/169476      -> ciflow/inductor/169476
2025-12-04T08:57:44.1557685Z  * [new tag]                 ciflow/inductor/169485      -> ciflow/inductor/169485
2025-12-04T08:57:44.1558434Z  * [new tag]                 ciflow/inductor/169493      -> ciflow/inductor/169493
2025-12-04T08:57:44.1559143Z  * [new tag]                 ciflow/inductor/169496      -> ciflow/inductor/169496
2025-12-04T08:57:44.1559878Z  * [new tag]                 ciflow/inductor/169497      -> ciflow/inductor/169497
2025-12-04T08:57:44.1560607Z  * [new tag]                 ciflow/inductor/169503      -> ciflow/inductor/169503
2025-12-04T08:57:44.1561339Z  * [new tag]                 ciflow/inductor/169504      -> ciflow/inductor/169504
2025-12-04T08:57:44.1562368Z  * [new tag]                 ciflow/inductor/169505      -> ciflow/inductor/169505
2025-12-04T08:57:44.1563644Z  * [new tag]                 ciflow/inductor/169508      -> ciflow/inductor/169508
2025-12-04T08:57:44.1564452Z  * [new tag]                 ciflow/inductor/169509      -> ciflow/inductor/169509
2025-12-04T08:57:44.1565643Z  * [new tag]                 ciflow/inductor/169513      -> ciflow/inductor/169513
2025-12-04T08:57:44.1566379Z  * [new tag]                 ciflow/inductor/169514      -> ciflow/inductor/169514
2025-12-04T08:57:44.1567098Z  * [new tag]                 ciflow/inductor/169515      -> ciflow/inductor/169515
2025-12-04T08:57:44.1567856Z  * [new tag]                 ciflow/inductor/169517      -> ciflow/inductor/169517
2025-12-04T08:57:44.1568585Z  * [new tag]                 ciflow/inductor/169519      -> ciflow/inductor/169519
2025-12-04T08:57:44.1569336Z  * [new tag]                 ciflow/inductor/169520      -> ciflow/inductor/169520
2025-12-04T08:57:44.1570070Z  * [new tag]                 ciflow/inductor/169521      -> ciflow/inductor/169521
2025-12-04T08:57:44.1570787Z  * [new tag]                 ciflow/inductor/169524      -> ciflow/inductor/169524
2025-12-04T08:57:44.1571531Z  * [new tag]                 ciflow/inductor/169527      -> ciflow/inductor/169527
2025-12-04T08:57:44.1572273Z  * [new tag]                 ciflow/inductor/169528      -> ciflow/inductor/169528
2025-12-04T08:57:44.1573199Z  * [new tag]                 ciflow/inductor/169532      -> ciflow/inductor/169532
2025-12-04T08:57:44.1573888Z  * [new tag]                 ciflow/inductor/169535      -> ciflow/inductor/169535
2025-12-04T08:57:44.1574614Z  * [new tag]                 ciflow/inductor/169536      -> ciflow/inductor/169536
2025-12-04T08:57:44.1575359Z  * [new tag]                 ciflow/inductor/169547      -> ciflow/inductor/169547
2025-12-04T08:57:44.1576147Z  * [new tag]                 ciflow/inductor/169548      -> ciflow/inductor/169548
2025-12-04T08:57:44.1577162Z  * [new tag]                 ciflow/inductor/169549      -> ciflow/inductor/169549
2025-12-04T08:57:44.1577937Z  * [new tag]                 ciflow/inductor/169551      -> ciflow/inductor/169551
2025-12-04T08:57:44.1578676Z  * [new tag]                 ciflow/inductor/169552      -> ciflow/inductor/169552
2025-12-04T08:57:44.1579460Z  * [new tag]                 ciflow/inductor/169553      -> ciflow/inductor/169553
2025-12-04T08:57:44.1580573Z  * [new tag]                 ciflow/inductor/3b9a386     -> ciflow/inductor/3b9a386
2025-12-04T08:57:44.1581544Z  * [new tag]                 ciflow/inductor/3d4b92b     -> ciflow/inductor/3d4b92b
2025-12-04T08:57:44.1582383Z  * [new tag]                 ciflow/inductor/d224ac7     -> ciflow/inductor/d224ac7
2025-12-04T08:57:44.1583345Z  * [new tag]                 ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994
2025-12-04T08:57:44.1583996Z  * [new tag]                 ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075
2025-12-04T08:57:44.1584692Z  * [new tag]                 ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876
2025-12-04T08:57:44.1585414Z  * [new tag]                 ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981
2025-12-04T08:57:44.1586219Z  * [new tag]                 ciflow/mps/166254           -> ciflow/mps/166254
2025-12-04T08:57:44.1586967Z  * [new tag]                 ciflow/mps/169017           -> ciflow/mps/169017
2025-12-04T08:57:44.1587752Z  * [new tag]                 ciflow/mps/169372           -> ciflow/mps/169372
2025-12-04T08:57:44.1588839Z  * [new tag]                 ciflow/mps/169478           -> ciflow/mps/169478
2025-12-04T08:57:44.1589578Z  * [new tag]                 ciflow/op-benchmark/157994  -> ciflow/op-benchmark/157994
2025-12-04T08:57:44.1590287Z  * [new tag]                 ciflow/op-benchmark/166075  -> ciflow/op-benchmark/166075
2025-12-04T08:57:44.1590950Z  * [new tag]                 ciflow/op-benchmark/169544  -> ciflow/op-benchmark/169544
2025-12-04T08:57:44.1591859Z  * [new tag]                 ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997
2025-12-04T08:57:44.1592699Z  * [new tag]                 ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517
2025-12-04T08:57:44.1593391Z  * [new tag]                 ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063
2025-12-04T08:57:44.1594548Z  * [new tag]                 ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425
2025-12-04T08:57:44.1595305Z  * [new tag]                 ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517
2025-12-04T08:57:44.1596027Z  * [new tag]                 ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063
2025-12-04T08:57:44.1596757Z  * [new tag]                 ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425
2025-12-04T08:57:44.1597773Z  * [new tag]                 ciflow/periodic/054a2fd     -> ciflow/periodic/054a2fd
2025-12-04T08:57:44.1598869Z  * [new tag]                 ciflow/periodic/167207      -> ciflow/periodic/167207
2025-12-04T08:57:44.1599635Z  * [new tag]                 ciflow/periodic/167978      -> ciflow/periodic/167978
2025-12-04T08:57:44.1600363Z  * [new tag]                 ciflow/periodic/168096      -> ciflow/periodic/168096
2025-12-04T08:57:44.1601279Z  * [new tag]                 ciflow/periodic/169286      -> ciflow/periodic/169286
2025-12-04T08:57:44.1602197Z  * [new tag]                 ciflow/periodic/2a6d37d     -> ciflow/periodic/2a6d37d
2025-12-04T08:57:44.1603158Z  * [new tag]                 ciflow/periodic/317eeb8     -> ciflow/periodic/317eeb8
2025-12-04T08:57:44.1603900Z  * [new tag]                 ciflow/periodic/3c32        -> ciflow/periodic/3c32
2025-12-04T08:57:44.1604892Z  * [new tag]                 ciflow/periodic/3e98831     -> ciflow/periodic/3e98831
2025-12-04T08:57:44.1606461Z  * [new tag]                 ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T08:57:44.1607210Z  * [new tag]                 ciflow/periodic/94512-point -> ciflow/periodic/94512-point
2025-12-04T08:57:44.1608420Z  * [new tag]                 ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519
2025-12-04T08:57:44.1609283Z  * [new tag]                 ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275
2025-12-04T08:57:44.1610163Z  * [new tag]                 ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761
2025-12-04T08:57:44.1611138Z  * [new tag]                 ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12
2025-12-04T08:57:44.1612222Z  * [new tag]                 ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0
2025-12-04T08:57:44.1613325Z  * [new tag]                 ciflow/periodic/sha-ec5b83  -> ciflow/periodic/sha-ec5b83
2025-12-04T08:57:44.1614091Z  * [new tag]                 ciflow/pull/167207          -> ciflow/pull/167207
2025-12-04T08:57:44.1615232Z  * [new tag]                 ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207
2025-12-04T08:57:44.1615929Z  * [new tag]                 ciflow/rocm-mi200/165545    -> ciflow/rocm-mi200/165545
2025-12-04T08:57:44.1616861Z  * [new tag]                 ciflow/rocm-mi200/165997    -> ciflow/rocm-mi200/165997
2025-12-04T08:57:44.1617712Z  * [new tag]                 ciflow/rocm-mi200/168096    -> ciflow/rocm-mi200/168096
2025-12-04T08:57:44.1618642Z  * [new tag]                 ciflow/rocm-mi200/168275    -> ciflow/rocm-mi200/168275
2025-12-04T08:57:44.1619292Z  * [new tag]                 ciflow/rocm-mi200/169063    -> ciflow/rocm-mi200/169063
2025-12-04T08:57:44.1620226Z  * [new tag]                 ciflow/rocm-mi200/169356    -> ciflow/rocm-mi200/169356
2025-12-04T08:57:44.1620991Z  * [new tag]                 ciflow/rocm-mi200/169425    -> ciflow/rocm-mi200/169425
2025-12-04T08:57:44.1622067Z  * [new tag]                 ciflow/rocm-mi300/165545    -> ciflow/rocm-mi300/165545
2025-12-04T08:57:44.1622889Z  * [new tag]                 ciflow/rocm-mi300/167157    -> ciflow/rocm-mi300/167157
2025-12-04T08:57:44.1623586Z  * [new tag]                 ciflow/rocm-mi300/168096    -> ciflow/rocm-mi300/168096
2025-12-04T08:57:44.1624301Z  * [new tag]                 ciflow/rocm-mi300/169063    -> ciflow/rocm-mi300/169063
2025-12-04T08:57:44.1624997Z  * [new tag]                 ciflow/rocm-mi300/169425    -> ciflow/rocm-mi300/169425
2025-12-04T08:57:44.1625927Z  * [new tag]                 ciflow/rocm-mi355/167157    -> ciflow/rocm-mi355/167157
2025-12-04T08:57:44.1626608Z  * [new tag]                 ciflow/rocm-mi355/168275    -> ciflow/rocm-mi355/168275
2025-12-04T08:57:44.1627304Z  * [new tag]                 ciflow/rocm-mi355/169425    -> ciflow/rocm-mi355/169425
2025-12-04T08:57:44.1628327Z  * [new tag]                 ciflow/rocm-navi31/168275   -> ciflow/rocm-navi31/168275
2025-12-04T08:57:44.1628985Z  * [new tag]                 ciflow/rocm-navi31/169425   -> ciflow/rocm-navi31/169425
2025-12-04T08:57:44.1629942Z  * [new tag]                 ciflow/rocm/115316          -> ciflow/rocm/115316
2025-12-04T08:57:44.1630608Z  * [new tag]                 ciflow/rocm/148492          -> ciflow/rocm/148492
2025-12-04T08:57:44.1631291Z  * [new tag]                 ciflow/rocm/160685          -> ciflow/rocm/160685
2025-12-04T08:57:44.1632015Z  * [new tag]                 ciflow/rocm/161607          -> ciflow/rocm/161607
2025-12-04T08:57:44.1632725Z  * [new tag]                 ciflow/rocm/162052          -> ciflow/rocm/162052
2025-12-04T08:57:44.1633641Z  * [new tag]                 ciflow/rocm/165997          -> ciflow/rocm/165997
2025-12-04T08:57:44.1634332Z  * [new tag]                 ciflow/rocm/166165          -> ciflow/rocm/166165
2025-12-04T08:57:44.1635012Z  * [new tag]                 ciflow/rocm/166517          -> ciflow/rocm/166517
2025-12-04T08:57:44.1635674Z  * [new tag]                 ciflow/rocm/167207          -> ciflow/rocm/167207
2025-12-04T08:57:44.1636474Z  * [new tag]                 ciflow/rocm/167536          -> ciflow/rocm/167536
2025-12-04T08:57:44.1637093Z  * [new tag]                 ciflow/rocm/167781          -> ciflow/rocm/167781
2025-12-04T08:57:44.1638059Z  * [new tag]                 ciflow/rocm/167989          -> ciflow/rocm/167989
2025-12-04T08:57:44.1639042Z  * [new tag]                 ciflow/rocm/168073          -> ciflow/rocm/168073
2025-12-04T08:57:44.1639906Z  * [new tag]                 ciflow/rocm/168195          -> ciflow/rocm/168195
2025-12-04T08:57:44.1640622Z  * [new tag]                 ciflow/rocm/168939          -> ciflow/rocm/168939
2025-12-04T08:57:44.1641321Z  * [new tag]                 ciflow/rocm/168971          -> ciflow/rocm/168971
2025-12-04T08:57:44.1642025Z  * [new tag]                 ciflow/rocm/169024          -> ciflow/rocm/169024
2025-12-04T08:57:44.1642757Z  * [new tag]                 ciflow/rocm/169200          -> ciflow/rocm/169200
2025-12-04T08:57:44.1643475Z  * [new tag]                 ciflow/rocm/169216          -> ciflow/rocm/169216
2025-12-04T08:57:44.1644186Z  * [new tag]                 ciflow/rocm/169312          -> ciflow/rocm/169312
2025-12-04T08:57:44.1644890Z  * [new tag]                 ciflow/rocm/169380          -> ciflow/rocm/169380
2025-12-04T08:57:44.1645611Z  * [new tag]                 ciflow/rocm/169427          -> ciflow/rocm/169427
2025-12-04T08:57:44.1646343Z  * [new tag]                 ciflow/rocm/169455          -> ciflow/rocm/169455
2025-12-04T08:57:44.1647047Z  * [new tag]                 ciflow/rocm/169470          -> ciflow/rocm/169470
2025-12-04T08:57:44.1647780Z  * [new tag]                 ciflow/rocm/169471          -> ciflow/rocm/169471
2025-12-04T08:57:44.1648485Z  * [new tag]                 ciflow/rocm/169472          -> ciflow/rocm/169472
2025-12-04T08:57:44.1649206Z  * [new tag]                 ciflow/rocm/169514          -> ciflow/rocm/169514
2025-12-04T08:57:44.1650349Z  * [new tag]                 ciflow/slow/01c7106         -> ciflow/slow/01c7106
2025-12-04T08:57:44.1651131Z  * [new tag]                 ciflow/slow/0577043         -> ciflow/slow/0577043
2025-12-04T08:57:44.1652533Z  * [new tag]                 ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym
2025-12-04T08:57:44.1652996Z  * [new tag]                 ciflow/slow/0e81104         -> ciflow/slow/0e81104
2025-12-04T08:57:44.1654189Z  * [new tag]                 ciflow/slow/167207          -> ciflow/slow/167207
2025-12-04T08:57:44.1654814Z  * [new tag]                 ciflow/slow/168050          -> ciflow/slow/168050
2025-12-04T08:57:44.1655805Z  * [new tag]                 ciflow/slow/1732077         -> ciflow/slow/1732077
2025-12-04T08:57:44.1657004Z  * [new tag]                 ciflow/slow/187eb7c         -> ciflow/slow/187eb7c
2025-12-04T08:57:44.1658157Z  * [new tag]                 ciflow/slow/1faef89         -> ciflow/slow/1faef89
2025-12-04T08:57:44.1659450Z  * [new tag]                 ciflow/slow/3920ec1         -> ciflow/slow/3920ec1
2025-12-04T08:57:44.1660502Z  * [new tag]                 ciflow/slow/3b7c6b2         -> ciflow/slow/3b7c6b2
2025-12-04T08:57:44.1661531Z  * [new tag]                 ciflow/slow/59a3759         -> ciflow/slow/59a3759
2025-12-04T08:57:44.1662444Z  * [new tag]                 ciflow/slow/70ef0bb         -> ciflow/slow/70ef0bb
2025-12-04T08:57:44.1663263Z  * [new tag]                 ciflow/slow/788ff06         -> ciflow/slow/788ff06
2025-12-04T08:57:44.1664776Z  * [new tag]                 ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym
2025-12-04T08:57:44.1665246Z  * [new tag]                 ciflow/slow/9d85864         -> ciflow/slow/9d85864
2025-12-04T08:57:44.1666203Z  * [new tag]                 ciflow/slow/9ffad5b         -> ciflow/slow/9ffad5b
2025-12-04T08:57:44.1667132Z  * [new tag]                 ciflow/slow/a206e8b         -> ciflow/slow/a206e8b
2025-12-04T08:57:44.1668146Z  * [new tag]                 ciflow/slow/a837609         -> ciflow/slow/a837609
2025-12-04T08:57:44.1669073Z  * [new tag]                 ciflow/slow/af841f3         -> ciflow/slow/af841f3
2025-12-04T08:57:44.1670566Z  * [new tag]                 ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym
2025-12-04T08:57:44.1671026Z  * [new tag]                 ciflow/torchbench/168175    -> ciflow/torchbench/168175
2025-12-04T08:57:44.1671823Z  * [new tag]                 ciflow/trunk/148492         -> ciflow/trunk/148492
2025-12-04T08:57:44.1672588Z  * [new tag]                 ciflow/trunk/157149         -> ciflow/trunk/157149
2025-12-04T08:57:44.1673282Z  * [new tag]                 ciflow/trunk/157994         -> ciflow/trunk/157994
2025-12-04T08:57:44.1673997Z  * [new tag]                 ciflow/trunk/159718         -> ciflow/trunk/159718
2025-12-04T08:57:44.1674679Z  * [new tag]                 ciflow/trunk/160685         -> ciflow/trunk/160685
2025-12-04T08:57:44.1675366Z  * [new tag]                 ciflow/trunk/160729         -> ciflow/trunk/160729
2025-12-04T08:57:44.1676068Z  * [new tag]                 ciflow/trunk/162275         -> ciflow/trunk/162275
2025-12-04T08:57:44.1676745Z  * [new tag]                 ciflow/trunk/162795         -> ciflow/trunk/162795
2025-12-04T08:57:44.1677447Z  * [new tag]                 ciflow/trunk/163245         -> ciflow/trunk/163245
2025-12-04T08:57:44.1678199Z  * [new tag]                 ciflow/trunk/163942         -> ciflow/trunk/163942
2025-12-04T08:57:44.1678896Z  * [new tag]                 ciflow/trunk/165274         -> ciflow/trunk/165274
2025-12-04T08:57:44.1680053Z  * [new tag]                 ciflow/trunk/165483         -> ciflow/trunk/165483
2025-12-04T08:57:44.1681073Z  * [new tag]                 ciflow/trunk/165728         -> ciflow/trunk/165728
2025-12-04T08:57:44.1681874Z  * [new tag]                 ciflow/trunk/165922         -> ciflow/trunk/165922
2025-12-04T08:57:44.1682592Z  * [new tag]                 ciflow/trunk/166075         -> ciflow/trunk/166075
2025-12-04T08:57:44.1683333Z  * [new tag]                 ciflow/trunk/166165         -> ciflow/trunk/166165
2025-12-04T08:57:44.1684081Z  * [new tag]                 ciflow/trunk/166829         -> ciflow/trunk/166829
2025-12-04T08:57:44.1685081Z  * [new tag]                 ciflow/trunk/166843         -> ciflow/trunk/166843
2025-12-04T08:57:44.1685753Z  * [new tag]                 ciflow/trunk/166876         -> ciflow/trunk/166876
2025-12-04T08:57:44.1686472Z  * [new tag]                 ciflow/trunk/167207         -> ciflow/trunk/167207
2025-12-04T08:57:44.1687404Z  * [new tag]                 ciflow/trunk/167536         -> ciflow/trunk/167536
2025-12-04T08:57:44.1688057Z  * [new tag]                 ciflow/trunk/167552         -> ciflow/trunk/167552
2025-12-04T08:57:44.1688818Z  * [new tag]                 ciflow/trunk/167555         -> ciflow/trunk/167555
2025-12-04T08:57:44.1689515Z  * [new tag]                 ciflow/trunk/167599         -> ciflow/trunk/167599
2025-12-04T08:57:44.1690318Z  * [new tag]                 ciflow/trunk/167659         -> ciflow/trunk/167659
2025-12-04T08:57:44.1691123Z  * [new tag]                 ciflow/trunk/167672         -> ciflow/trunk/167672
2025-12-04T08:57:44.1691832Z  * [new tag]                 ciflow/trunk/167742         -> ciflow/trunk/167742
2025-12-04T08:57:44.1692565Z  * [new tag]                 ciflow/trunk/167781         -> ciflow/trunk/167781
2025-12-04T08:57:44.1693539Z  * [new tag]                 ciflow/trunk/167837         -> ciflow/trunk/167837
2025-12-04T08:57:44.1694193Z  * [new tag]                 ciflow/trunk/167887         -> ciflow/trunk/167887
2025-12-04T08:57:44.1694908Z  * [new tag]                 ciflow/trunk/167978         -> ciflow/trunk/167978
2025-12-04T08:57:44.1695624Z  * [new tag]                 ciflow/trunk/168050         -> ciflow/trunk/168050
2025-12-04T08:57:44.1697519Z  * [new tag]                 ciflow/trunk/168051         -> ciflow/trunk/168051
2025-12-04T08:57:44.1698255Z  * [new tag]                 ciflow/trunk/168096         -> ciflow/trunk/168096
2025-12-04T08:57:44.1698930Z  * [new tag]                 ciflow/trunk/168127         -> ciflow/trunk/168127
2025-12-04T08:57:44.1699671Z  * [new tag]                 ciflow/trunk/168157         -> ciflow/trunk/168157
2025-12-04T08:57:44.1700440Z  * [new tag]                 ciflow/trunk/168175         -> ciflow/trunk/168175
2025-12-04T08:57:44.1701155Z  * [new tag]                 ciflow/trunk/168209         -> ciflow/trunk/168209
2025-12-04T08:57:44.1702097Z  * [new tag]                 ciflow/trunk/168213         -> ciflow/trunk/168213
2025-12-04T08:57:44.1703037Z  * [new tag]                 ciflow/trunk/168226         -> ciflow/trunk/168226
2025-12-04T08:57:44.1703764Z  * [new tag]                 ciflow/trunk/168262         -> ciflow/trunk/168262
2025-12-04T08:57:44.1704518Z  * [new tag]                 ciflow/trunk/168275         -> ciflow/trunk/168275
2025-12-04T08:57:44.1705415Z  * [new tag]                 ciflow/trunk/168328         -> ciflow/trunk/168328
2025-12-04T08:57:44.1706119Z  * [new tag]                 ciflow/trunk/168368         -> ciflow/trunk/168368
2025-12-04T08:57:44.1706920Z  * [new tag]                 ciflow/trunk/168917         -> ciflow/trunk/168917
2025-12-04T08:57:44.1707640Z  * [new tag]                 ciflow/trunk/168933         -> ciflow/trunk/168933
2025-12-04T08:57:44.1708581Z  * [new tag]                 ciflow/trunk/168941         -> ciflow/trunk/168941
2025-12-04T08:57:44.1709359Z  * [new tag]                 ciflow/trunk/168955         -> ciflow/trunk/168955
2025-12-04T08:57:44.1710131Z  * [new tag]                 ciflow/trunk/168980         -> ciflow/trunk/168980
2025-12-04T08:57:44.1711161Z  * [new tag]                 ciflow/trunk/169004         -> ciflow/trunk/169004
2025-12-04T08:57:44.1711848Z  * [new tag]                 ciflow/trunk/169006         -> ciflow/trunk/169006
2025-12-04T08:57:44.1712586Z  * [new tag]                 ciflow/trunk/169023         -> ciflow/trunk/169023
2025-12-04T08:57:44.1713325Z  * [new tag]                 ciflow/trunk/169025         -> ciflow/trunk/169025
2025-12-04T08:57:44.1732301Z  * [new tag]                 ciflow/trunk/169048         -> ciflow/trunk/169048
2025-12-04T08:57:44.1732651Z  * [new tag]                 ciflow/trunk/169066         -> ciflow/trunk/169066
2025-12-04T08:57:44.1732855Z  * [new tag]                 ciflow/trunk/169091         -> ciflow/trunk/169091
2025-12-04T08:57:44.1733063Z  * [new tag]                 ciflow/trunk/169102         -> ciflow/trunk/169102
2025-12-04T08:57:44.1733362Z  * [new tag]                 ciflow/trunk/169103         -> ciflow/trunk/169103
2025-12-04T08:57:44.1733545Z  * [new tag]                 ciflow/trunk/169125         -> ciflow/trunk/169125
2025-12-04T08:57:44.1733740Z  * [new tag]                 ciflow/trunk/169139         -> ciflow/trunk/169139
2025-12-04T08:57:44.1733925Z  * [new tag]                 ciflow/trunk/169148         -> ciflow/trunk/169148
2025-12-04T08:57:44.1734135Z  * [new tag]                 ciflow/trunk/169151         -> ciflow/trunk/169151
2025-12-04T08:57:44.1734327Z  * [new tag]                 ciflow/trunk/169156         -> ciflow/trunk/169156
2025-12-04T08:57:44.1734509Z  * [new tag]                 ciflow/trunk/169176         -> ciflow/trunk/169176
2025-12-04T08:57:44.1734701Z  * [new tag]                 ciflow/trunk/169204         -> ciflow/trunk/169204
2025-12-04T08:57:44.1734886Z  * [new tag]                 ciflow/trunk/169207         -> ciflow/trunk/169207
2025-12-04T08:57:44.1735079Z  * [new tag]                 ciflow/trunk/169211         -> ciflow/trunk/169211
2025-12-04T08:57:44.1735262Z  * [new tag]                 ciflow/trunk/169229         -> ciflow/trunk/169229
2025-12-04T08:57:44.1735447Z  * [new tag]                 ciflow/trunk/169231         -> ciflow/trunk/169231
2025-12-04T08:57:44.1735641Z  * [new tag]                 ciflow/trunk/169260         -> ciflow/trunk/169260
2025-12-04T08:57:44.1735827Z  * [new tag]                 ciflow/trunk/169271         -> ciflow/trunk/169271
2025-12-04T08:57:44.1736516Z  * [new tag]                 ciflow/trunk/169280         -> ciflow/trunk/169280
2025-12-04T08:57:44.1736979Z  * [new tag]                 ciflow/trunk/169281         -> ciflow/trunk/169281
2025-12-04T08:57:44.1737170Z  * [new tag]                 ciflow/trunk/169286         -> ciflow/trunk/169286
2025-12-04T08:57:44.1737372Z  * [new tag]                 ciflow/trunk/169293         -> ciflow/trunk/169293
2025-12-04T08:57:44.1737560Z  * [new tag]                 ciflow/trunk/169296         -> ciflow/trunk/169296
2025-12-04T08:57:44.1737759Z  * [new tag]                 ciflow/trunk/169304         -> ciflow/trunk/169304
2025-12-04T08:57:44.1737961Z  * [new tag]                 ciflow/trunk/169305         -> ciflow/trunk/169305
2025-12-04T08:57:44.1738152Z  * [new tag]                 ciflow/trunk/169312         -> ciflow/trunk/169312
2025-12-04T08:57:44.1738353Z  * [new tag]                 ciflow/trunk/169328         -> ciflow/trunk/169328
2025-12-04T08:57:44.1738546Z  * [new tag]                 ciflow/trunk/169343         -> ciflow/trunk/169343
2025-12-04T08:57:44.1738742Z  * [new tag]                 ciflow/trunk/169355         -> ciflow/trunk/169355
2025-12-04T08:57:44.1738946Z  * [new tag]                 ciflow/trunk/169370         -> ciflow/trunk/169370
2025-12-04T08:57:44.1739203Z  * [new tag]                 ciflow/trunk/169379         -> ciflow/trunk/169379
2025-12-04T08:57:44.1739986Z  * [new tag]                 ciflow/trunk/169380         -> ciflow/trunk/169380
2025-12-04T08:57:44.1740718Z  * [new tag]                 ciflow/trunk/169385         -> ciflow/trunk/169385
2025-12-04T08:57:44.1741493Z  * [new tag]                 ciflow/trunk/169387         -> ciflow/trunk/169387
2025-12-04T08:57:44.1742473Z  * [new tag]                 ciflow/trunk/169410         -> ciflow/trunk/169410
2025-12-04T08:57:44.1743152Z  * [new tag]                 ciflow/trunk/169412         -> ciflow/trunk/169412
2025-12-04T08:57:44.1743916Z  * [new tag]                 ciflow/trunk/169418         -> ciflow/trunk/169418
2025-12-04T08:57:44.1744653Z  * [new tag]                 ciflow/trunk/169423         -> ciflow/trunk/169423
2025-12-04T08:57:44.1745414Z  * [new tag]                 ciflow/trunk/169427         -> ciflow/trunk/169427
2025-12-04T08:57:44.1746157Z  * [new tag]                 ciflow/trunk/169430         -> ciflow/trunk/169430
2025-12-04T08:57:44.1746905Z  * [new tag]                 ciflow/trunk/169437         -> ciflow/trunk/169437
2025-12-04T08:57:44.1747669Z  * [new tag]                 ciflow/trunk/169442         -> ciflow/trunk/169442
2025-12-04T08:57:44.1748577Z  * [new tag]                 ciflow/trunk/169452         -> ciflow/trunk/169452
2025-12-04T08:57:44.1749798Z  * [new tag]                 ciflow/trunk/169454         -> ciflow/trunk/169454
2025-12-04T08:57:44.1750454Z  * [new tag]                 ciflow/trunk/169459         -> ciflow/trunk/169459
2025-12-04T08:57:44.1751380Z  * [new tag]                 ciflow/trunk/169474         -> ciflow/trunk/169474
2025-12-04T08:57:44.1752069Z  * [new tag]                 ciflow/trunk/169475         -> ciflow/trunk/169475
2025-12-04T08:57:44.1752795Z  * [new tag]                 ciflow/trunk/169476         -> ciflow/trunk/169476
2025-12-04T08:57:44.1753688Z  * [new tag]                 ciflow/trunk/169487         -> ciflow/trunk/169487
2025-12-04T08:57:44.1754386Z  * [new tag]                 ciflow/trunk/169497         -> ciflow/trunk/169497
2025-12-04T08:57:44.1755167Z  * [new tag]                 ciflow/trunk/169503         -> ciflow/trunk/169503
2025-12-04T08:57:44.1755888Z  * [new tag]                 ciflow/trunk/169505         -> ciflow/trunk/169505
2025-12-04T08:57:44.1756627Z  * [new tag]                 ciflow/trunk/169507         -> ciflow/trunk/169507
2025-12-04T08:57:44.1757374Z  * [new tag]                 ciflow/trunk/169514         -> ciflow/trunk/169514
2025-12-04T08:57:44.1758086Z  * [new tag]                 ciflow/trunk/169517         -> ciflow/trunk/169517
2025-12-04T08:57:44.1758903Z  * [new tag]                 ciflow/trunk/169519         -> ciflow/trunk/169519
2025-12-04T08:57:44.1759566Z  * [new tag]                 ciflow/trunk/169528         -> ciflow/trunk/169528
2025-12-04T08:57:44.1760294Z  * [new tag]                 ciflow/trunk/169541         -> ciflow/trunk/169541
2025-12-04T08:57:44.1761211Z  * [new tag]                 ciflow/trunk/169555         -> ciflow/trunk/169555
2025-12-04T08:57:44.1762447Z  * [new tag]                 ciflow/unstable/123         -> ciflow/unstable/123
2025-12-04T08:57:44.1763262Z  * [new tag]                 ciflow/vllm/165270          -> ciflow/vllm/165270
2025-12-04T08:57:44.1763941Z  * [new tag]                 ciflow/vllm/165274          -> ciflow/vllm/165274
2025-12-04T08:57:44.1764656Z  * [new tag]                 ciflow/vllm/166494          -> ciflow/vllm/166494
2025-12-04T08:57:44.1765330Z  * [new tag]                 ciflow/vllm/169219          -> ciflow/vllm/169219
2025-12-04T08:57:44.1766009Z  * [new tag]                 ciflow/vllm/169220          -> ciflow/vllm/169220
2025-12-04T08:57:44.1766885Z  * [new tag]                 ciflow/xpu/157994           -> ciflow/xpu/157994
2025-12-04T08:57:44.1767531Z  * [new tag]                 ciflow/xpu/159718           -> ciflow/xpu/159718
2025-12-04T08:57:44.1768470Z  * [new tag]                 ciflow/xpu/161940           -> ciflow/xpu/161940
2025-12-04T08:57:44.1769184Z  * [new tag]                 ciflow/xpu/163251           -> ciflow/xpu/163251
2025-12-04T08:57:44.1769885Z  * [new tag]                 ciflow/xpu/166829           -> ciflow/xpu/166829
2025-12-04T08:57:44.1770563Z  * [new tag]                 ciflow/xpu/166843           -> ciflow/xpu/166843
2025-12-04T08:57:44.1771254Z  * [new tag]                 ciflow/xpu/167972           -> ciflow/xpu/167972
2025-12-04T08:57:44.1771964Z  * [new tag]                 ciflow/xpu/167981           -> ciflow/xpu/167981
2025-12-04T08:57:44.1772651Z  * [new tag]                 ciflow/xpu/168213           -> ciflow/xpu/168213
2025-12-04T08:57:44.1773427Z  * [new tag]                 ciflow/xpu/168262           -> ciflow/xpu/168262
2025-12-04T08:57:44.1774097Z  * [new tag]                 ciflow/xpu/168328           -> ciflow/xpu/168328
2025-12-04T08:57:44.1775126Z  * [new tag]                 ciflow/xpu/168950           -> ciflow/xpu/168950
2025-12-04T08:57:44.1776238Z  * [new tag]                 ciflow/xpu/169039           -> ciflow/xpu/169039
2025-12-04T08:57:44.1777484Z  * [new tag]                 ciflow/xpu/169200           -> ciflow/xpu/169200
2025-12-04T08:57:44.1778203Z  * [new tag]                 ciflow/xpu/169203           -> ciflow/xpu/169203
2025-12-04T08:57:44.1778933Z  * [new tag]                 ciflow/xpu/169229           -> ciflow/xpu/169229
2025-12-04T08:57:44.1779696Z  * [new tag]                 ciflow/xpu/169230           -> ciflow/xpu/169230
2025-12-04T08:57:44.1780420Z  * [new tag]                 ciflow/xpu/169231           -> ciflow/xpu/169231
2025-12-04T08:57:44.1781352Z  * [new tag]                 ciflow/xpu/169241           -> ciflow/xpu/169241
2025-12-04T08:57:44.1782050Z  * [new tag]                 ciflow/xpu/169280           -> ciflow/xpu/169280
2025-12-04T08:57:44.1782781Z  * [new tag]                 ciflow/xpu/169296           -> ciflow/xpu/169296
2025-12-04T08:57:44.1783775Z  * [new tag]                 ciflow/xpu/169353           -> ciflow/xpu/169353
2025-12-04T08:57:44.1784454Z  * [new tag]                 ciflow/xpu/169410           -> ciflow/xpu/169410
2025-12-04T08:57:44.1785207Z  * [new tag]                 ciflow/xpu/169442           -> ciflow/xpu/169442
2025-12-04T08:57:44.1786140Z  * [new tag]                 ciflow/xpu/169555           -> ciflow/xpu/169555
2025-12-04T08:57:44.1787002Z  * [new tag]                 cslpull75                   -> cslpull75
2025-12-04T08:57:44.1787765Z  * [new tag]                 cslpull76                   -> cslpull76
2025-12-04T08:57:44.1788561Z  * [new tag]                 cslpull77                   -> cslpull77
2025-12-04T08:57:44.1789538Z  * [new tag]                 cslpull78                   -> cslpull78
2025-12-04T08:57:44.1790667Z  * [new tag]                 cslpull79                   -> cslpull79
2025-12-04T08:57:44.1791673Z  * [new tag]                 cslpull80                   -> cslpull80
2025-12-04T08:57:44.1792539Z  * [new tag]                 cslpull81                   -> cslpull81
2025-12-04T08:57:44.1793273Z  * [new tag]                 cslpull82                   -> cslpull82
2025-12-04T08:57:44.1794242Z  * [new tag]                 cslpull83                   -> cslpull83
2025-12-04T08:57:44.1794998Z  * [new tag]                 cslpull84                   -> cslpull84
2025-12-04T08:57:44.1795842Z  * [new tag]                 cslpull85                   -> cslpull85
2025-12-04T08:57:44.1796729Z  * [new tag]                 cslpull86                   -> cslpull86
2025-12-04T08:57:44.1797580Z  * [new tag]                 cslpull87                   -> cslpull87
2025-12-04T08:57:44.1798458Z  * [new tag]                 cslpull88                   -> cslpull88
2025-12-04T08:57:44.1799336Z  * [new tag]                 cslpull89                   -> cslpull89
2025-12-04T08:57:44.1799910Z  * [new tag]                 cslpull90                   -> cslpull90
2025-12-04T08:57:44.1801235Z  * [new tag]                 cslpull91                   -> cslpull91
2025-12-04T08:57:44.1801976Z  * [new tag]                 cslpull92                   -> cslpull92
2025-12-04T08:57:44.1802886Z  * [new tag]                 flight_5                    -> flight_5
2025-12-04T08:57:44.1803938Z  * [new tag]                 flight_5.1                  -> flight_5.1
2025-12-04T08:57:44.1804848Z  * [new tag]                 flight_5.2                  -> flight_5.2
2025-12-04T08:57:44.1805708Z  * [new tag]                 flight_5.3                  -> flight_5.3
2025-12-04T08:57:44.1806573Z  * [new tag]                 forpull1                    -> forpull1
2025-12-04T08:57:44.1807605Z  * [new tag]                 malfet/tag-2ef5611          -> malfet/tag-2ef5611
2025-12-04T08:57:44.1808495Z  * [new tag]                 malfet/tag-317b1a0          -> malfet/tag-317b1a0
2025-12-04T08:57:44.1809244Z  * [new tag]                 malfet/tag-ec6f767          -> malfet/tag-ec6f767
2025-12-04T08:57:44.1810205Z  * [new tag]                 nightly-binary              -> nightly-binary
2025-12-04T08:57:44.1811095Z  * [new tag]                 sqzhang_flight4_plus        -> sqzhang_flight4_plus
2025-12-04T08:57:44.1812062Z  * [new tag]                 sqzhang_flight_3            -> sqzhang_flight_3
2025-12-04T08:57:44.1813397Z  * [new tag]                 trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272
2025-12-04T08:57:44.1814162Z  * [new tag]                 trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e
2025-12-04T08:57:44.1815491Z  * [new tag]                 trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88
2025-12-04T08:57:44.1816589Z  * [new tag]                 trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654
2025-12-04T08:57:44.1817827Z  * [new tag]                 trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb
2025-12-04T08:57:44.1818636Z  * [new tag]                 trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b
2025-12-04T08:57:44.1819537Z  * [new tag]                 trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672
2025-12-04T08:57:44.1820470Z  * [new tag]                 trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75
2025-12-04T08:57:44.1821906Z  * [new tag]                 trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2
2025-12-04T08:57:44.1822830Z  * [new tag]                 trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5
2025-12-04T08:57:44.1823863Z  * [new tag]                 trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688
2025-12-04T08:57:44.1824724Z  * [new tag]                 trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10
2025-12-04T08:57:44.1825675Z  * [new tag]                 trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec
2025-12-04T08:57:44.1826568Z  * [new tag]                 trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f
2025-12-04T08:57:44.1827802Z  * [new tag]                 trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11
2025-12-04T08:57:44.1828642Z  * [new tag]                 trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63
2025-12-04T08:57:44.1829542Z  * [new tag]                 trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5
2025-12-04T08:57:44.1830416Z  * [new tag]                 trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676
2025-12-04T08:57:44.1831318Z  * [new tag]                 trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e
2025-12-04T08:57:44.1832153Z  * [new tag]                 trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520
2025-12-04T08:57:44.1833167Z  * [new tag]                 trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8
2025-12-04T08:57:44.1834144Z  * [new tag]                 trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c
2025-12-04T08:57:44.1834929Z  * [new tag]                 trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d
2025-12-04T08:57:44.1835845Z  * [new tag]                 trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8
2025-12-04T08:57:44.1836816Z  * [new tag]                 trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de
2025-12-04T08:57:44.1837705Z  * [new tag]                 trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543
2025-12-04T08:57:44.1838546Z  * [new tag]                 trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7
2025-12-04T08:57:44.1839420Z  * [new tag]                 trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f
2025-12-04T08:57:44.1840432Z  * [new tag]                 trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9
2025-12-04T08:57:44.1841190Z  * [new tag]                 trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87
2025-12-04T08:57:44.1842136Z  * [new tag]                 trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc
2025-12-04T08:57:44.1842992Z  * [new tag]                 trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25
2025-12-04T08:57:44.1844044Z  * [new tag]                 trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563
2025-12-04T08:57:44.1844909Z  * [new tag]                 trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf
2025-12-04T08:57:44.1845660Z  * [new tag]                 trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd
2025-12-04T08:57:44.1846548Z  * [new tag]                 trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1
2025-12-04T08:57:44.1847481Z  * [new tag]                 trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708
2025-12-04T08:57:44.1848424Z  * [new tag]                 trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35
2025-12-04T08:57:44.1849289Z  * [new tag]                 trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec
2025-12-04T08:57:44.1850214Z  * [new tag]                 trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94
2025-12-04T08:57:44.1851074Z  * [new tag]                 trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1
2025-12-04T08:57:44.1852454Z  * [new tag]                 trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48
2025-12-04T08:57:44.1853247Z  * [new tag]                 trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991
2025-12-04T08:57:44.1854203Z  * [new tag]                 trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8
2025-12-04T08:57:44.1855060Z  * [new tag]                 trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf
2025-12-04T08:57:44.1855977Z  * [new tag]                 trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee
2025-12-04T08:57:44.1857211Z  * [new tag]                 trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938
2025-12-04T08:57:44.1858018Z  * [new tag]                 trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725
2025-12-04T08:57:44.1859195Z  * [new tag]                 trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae
2025-12-04T08:57:44.1860067Z  * [new tag]                 trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f
2025-12-04T08:57:44.1860961Z  * [new tag]                 trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c
2025-12-04T08:57:44.1861885Z  * [new tag]                 trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247
2025-12-04T08:57:44.1862770Z  * [new tag]                 trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c
2025-12-04T08:57:44.1863647Z  * [new tag]                 trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a
2025-12-04T08:57:44.1864480Z  * [new tag]                 trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686
2025-12-04T08:57:44.1865743Z  * [new tag]                 trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54
2025-12-04T08:57:44.1866468Z  * [new tag]                 trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af
2025-12-04T08:57:44.1867475Z  * [new tag]                 trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96
2025-12-04T08:57:44.1868380Z  * [new tag]                 trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873
2025-12-04T08:57:44.1869391Z  * [new tag]                 trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985
2025-12-04T08:57:44.1870108Z  * [new tag]                 trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f
2025-12-04T08:57:44.1871256Z  * [new tag]                 trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa
2025-12-04T08:57:44.1872058Z  * [new tag]                 trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c
2025-12-04T08:57:44.1873622Z  * [new tag]                 trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a
2025-12-04T08:57:44.1874495Z  * [new tag]                 trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d
2025-12-04T08:57:44.1875396Z  * [new tag]                 trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9
2025-12-04T08:57:44.1876308Z  * [new tag]                 trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3
2025-12-04T08:57:44.1877295Z  * [new tag]                 trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a
2025-12-04T08:57:44.1878184Z  * [new tag]                 trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292
2025-12-04T08:57:44.1879064Z  * [new tag]                 trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28
2025-12-04T08:57:44.1880047Z  * [new tag]                 trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66
2025-12-04T08:57:44.1880922Z  * [new tag]                 trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96
2025-12-04T08:57:44.1881687Z  * [new tag]                 trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc
2025-12-04T08:57:44.1882594Z  * [new tag]                 trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068
2025-12-04T08:57:44.1883480Z  * [new tag]                 trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd
2025-12-04T08:57:44.1884230Z  * [new tag]                 trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6
2025-12-04T08:57:44.1885241Z  * [new tag]                 trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883
2025-12-04T08:57:44.1886184Z  * [new tag]                 trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c
2025-12-04T08:57:44.1887116Z  * [new tag]                 trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b
2025-12-04T08:57:44.1887869Z  * [new tag]                 trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c
2025-12-04T08:57:44.1888696Z  * [new tag]                 trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202
2025-12-04T08:57:44.1889640Z  * [new tag]                 trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447
2025-12-04T08:57:44.1890611Z  * [new tag]                 trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef
2025-12-04T08:57:44.1891474Z  * [new tag]                 trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719
2025-12-04T08:57:44.1892340Z  * [new tag]                 trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b
2025-12-04T08:57:44.1893235Z  * [new tag]                 trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a
2025-12-04T08:57:44.1894203Z  * [new tag]                 trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206
2025-12-04T08:57:44.1895092Z  * [new tag]                 trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386
2025-12-04T08:57:44.1895977Z  * [new tag]                 trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4
2025-12-04T08:57:44.1897168Z  * [new tag]                 trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d
2025-12-04T08:57:44.1898101Z  * [new tag]                 trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b
2025-12-04T08:57:44.1899002Z  * [new tag]                 trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5
2025-12-04T08:57:44.1899910Z  * [new tag]                 trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8
2025-12-04T08:57:44.1900909Z  * [new tag]                 trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec
2025-12-04T08:57:44.1901792Z  * [new tag]                 trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71
2025-12-04T08:57:44.1902717Z  * [new tag]                 trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d
2025-12-04T08:57:44.1903753Z  * [new tag]                 trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a
2025-12-04T08:57:44.1904729Z  * [new tag]                 trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e
2025-12-04T08:57:44.1905623Z  * [new tag]                 trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8
2025-12-04T08:57:44.1906508Z  * [new tag]                 trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485
2025-12-04T08:57:44.1907386Z  * [new tag]                 trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734
2025-12-04T08:57:44.1908406Z  * [new tag]                 trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f
2025-12-04T08:57:44.1909413Z  * [new tag]                 trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696
2025-12-04T08:57:44.1910333Z  * [new tag]                 trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03
2025-12-04T08:57:44.1911270Z  * [new tag]                 trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6
2025-12-04T08:57:44.1912677Z  * [new tag]                 trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7
2025-12-04T08:57:44.1914434Z  * [new tag]                 trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3
2025-12-04T08:57:44.1914863Z  * [new tag]                 trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca
2025-12-04T08:57:44.1915595Z  * [new tag]                 trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b
2025-12-04T08:57:44.1916247Z  * [new tag]                 trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609
2025-12-04T08:57:44.1917120Z  * [new tag]                 trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b
2025-12-04T08:57:44.1917807Z  * [new tag]                 trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T08:57:44.1918616Z  * [new tag]                 trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8
2025-12-04T08:57:44.1919519Z  * [new tag]                 trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed
2025-12-04T08:57:44.1920392Z  * [new tag]                 trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8
2025-12-04T08:57:44.1921837Z  * [new tag]                 trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e
2025-12-04T08:57:44.1922460Z  * [new tag]                 trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead
2025-12-04T08:57:44.1923437Z  * [new tag]                 trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718
2025-12-04T08:57:44.1924694Z  * [new tag]                 trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0
2025-12-04T08:57:44.1925915Z  * [new tag]                 trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75
2025-12-04T08:57:44.1926770Z  * [new tag]                 trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece
2025-12-04T08:57:44.1927537Z  * [new tag]                 trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6
2025-12-04T08:57:44.1928741Z  * [new tag]                 trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4
2025-12-04T08:57:44.1929545Z  * [new tag]                 trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c
2025-12-04T08:57:44.1930490Z  * [new tag]                 trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43
2025-12-04T08:57:44.1931456Z  * [new tag]                 trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922
2025-12-04T08:57:44.1932457Z  * [new tag]                 trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca
2025-12-04T08:57:44.1933357Z  * [new tag]                 trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38
2025-12-04T08:57:44.1934335Z  * [new tag]                 trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90
2025-12-04T08:57:44.1935148Z  * [new tag]                 trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c
2025-12-04T08:57:44.1936001Z  * [new tag]                 trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87
2025-12-04T08:57:44.1937404Z  * [new tag]                 trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693
2025-12-04T08:57:44.1938336Z  * [new tag]                 trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa
2025-12-04T08:57:44.1939259Z  * [new tag]                 trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d
2025-12-04T08:57:44.1940182Z  * [new tag]                 trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639
2025-12-04T08:57:44.1941118Z  * [new tag]                 trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8
2025-12-04T08:57:44.1942059Z  * [new tag]                 trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d
2025-12-04T08:57:44.1942996Z  * [new tag]                 trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a
2025-12-04T08:57:44.1943990Z  * [new tag]                 trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742
2025-12-04T08:57:44.1945010Z  * [new tag]                 trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098
2025-12-04T08:57:44.1945933Z  * [new tag]                 trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa
2025-12-04T08:57:44.1946812Z  * [new tag]                 trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d
2025-12-04T08:57:44.1948076Z  * [new tag]                 trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c
2025-12-04T08:57:44.1949002Z  * [new tag]                 trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90
2025-12-04T08:57:44.1949934Z  * [new tag]                 trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c
2025-12-04T08:57:44.1950654Z  * [new tag]                 trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45
2025-12-04T08:57:44.1951622Z  * [new tag]                 trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109
2025-12-04T08:57:44.1952522Z  * [new tag]                 trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e
2025-12-04T08:57:44.1953660Z  * [new tag]                 trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e
2025-12-04T08:57:44.1954389Z  * [new tag]                 trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e
2025-12-04T08:57:44.1955155Z  * [new tag]                 trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48
2025-12-04T08:57:44.1956064Z  * [new tag]                 trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62
2025-12-04T08:57:44.1957045Z  * [new tag]                 trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2
2025-12-04T08:57:44.1957950Z  * [new tag]                 trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99
2025-12-04T08:57:44.1958867Z  * [new tag]                 trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24
2025-12-04T08:57:44.1959795Z  * [new tag]                 trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7
2025-12-04T08:57:44.1960773Z  * [new tag]                 trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a
2025-12-04T08:57:44.1961783Z  * [new tag]                 trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417
2025-12-04T08:57:44.1962705Z  * [new tag]                 trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4
2025-12-04T08:57:44.1963638Z  * [new tag]                 trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b
2025-12-04T08:57:44.1964754Z  * [new tag]                 trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e
2025-12-04T08:57:44.1965672Z  * [new tag]                 trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf
2025-12-04T08:57:44.1966556Z  * [new tag]                 trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5
2025-12-04T08:57:44.1967500Z  * [new tag]                 trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f
2025-12-04T08:57:44.1968432Z  * [new tag]                 trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f
2025-12-04T08:57:44.1969342Z  * [new tag]                 trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2
2025-12-04T08:57:44.1970241Z  * [new tag]                 trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55
2025-12-04T08:57:44.1971248Z  * [new tag]                 trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8
2025-12-04T08:57:44.1972467Z  * [new tag]                 trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09
2025-12-04T08:57:44.1973707Z  * [new tag]                 trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5
2025-12-04T08:57:44.1974889Z  * [new tag]                 trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564
2025-12-04T08:57:44.1975767Z  * [new tag]                 trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede
2025-12-04T08:57:44.1976950Z  * [new tag]                 trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac
2025-12-04T08:57:44.1978153Z  * [new tag]                 trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb
2025-12-04T08:57:44.1978910Z  * [new tag]                 trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1
2025-12-04T08:57:44.1980142Z  * [new tag]                 trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0
2025-12-04T08:57:44.1981112Z  * [new tag]                 trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09
2025-12-04T08:57:44.1981958Z  * [new tag]                 trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9
2025-12-04T08:57:44.1983009Z  * [new tag]                 trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a
2025-12-04T08:57:44.1983928Z  * [new tag]                 trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace
2025-12-04T08:57:44.1984828Z  * [new tag]                 trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39
2025-12-04T08:57:44.1985745Z  * [new tag]                 trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50
2025-12-04T08:57:44.1986710Z  * [new tag]                 trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01
2025-12-04T08:57:44.1987734Z  * [new tag]                 trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf
2025-12-04T08:57:44.1988786Z  * [new tag]                 trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8
2025-12-04T08:57:44.1989624Z  * [new tag]                 trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d
2025-12-04T08:57:44.1990544Z  * [new tag]                 trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47
2025-12-04T08:57:44.1991442Z  * [new tag]                 trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1
2025-12-04T08:57:44.1992340Z  * [new tag]                 trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e
2025-12-04T08:57:44.1993276Z  * [new tag]                 trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a
2025-12-04T08:57:44.1994187Z  * [new tag]                 trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b
2025-12-04T08:57:44.1995106Z  * [new tag]                 trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec
2025-12-04T08:57:44.1996124Z  * [new tag]                 trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf
2025-12-04T08:57:44.1996990Z  * [new tag]                 trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd
2025-12-04T08:57:44.1997972Z  * [new tag]                 trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889
2025-12-04T08:57:44.1998956Z  * [new tag]                 trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77
2025-12-04T08:57:44.2000200Z  * [new tag]                 trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c
2025-12-04T08:57:44.2001176Z  * [new tag]                 trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b
2025-12-04T08:57:44.2001949Z  * [new tag]                 trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235
2025-12-04T08:57:44.2002917Z  * [new tag]                 trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e
2025-12-04T08:57:44.2003923Z  * [new tag]                 trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc
2025-12-04T08:57:44.2004734Z  * [new tag]                 trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3
2025-12-04T08:57:44.2005667Z  * [new tag]                 trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf
2025-12-04T08:57:44.2006586Z  * [new tag]                 trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e
2025-12-04T08:57:44.2007493Z  * [new tag]                 trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e
2025-12-04T08:57:44.2009003Z  * [new tag]                 trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2
2025-12-04T08:57:44.2009864Z  * [new tag]                 trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4
2025-12-04T08:57:44.2010793Z  * [new tag]                 trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53
2025-12-04T08:57:44.2011680Z  * [new tag]                 trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598
2025-12-04T08:57:44.2012587Z  * [new tag]                 trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40
2025-12-04T08:57:44.2013483Z  * [new tag]                 trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56
2025-12-04T08:57:44.2014406Z  * [new tag]                 trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8
2025-12-04T08:57:44.2015227Z  * [new tag]                 trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467
2025-12-04T08:57:44.2016164Z  * [new tag]                 trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17
2025-12-04T08:57:44.2017433Z  * [new tag]                 trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73
2025-12-04T08:57:44.2018239Z  * [new tag]                 trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7
2025-12-04T08:57:44.2019319Z  * [new tag]                 trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b
2025-12-04T08:57:44.2020224Z  * [new tag]                 trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7
2025-12-04T08:57:44.2024960Z  * [new tag]                 trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307
2025-12-04T08:57:44.2025999Z  * [new tag]                 trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009
2025-12-04T08:57:44.2027199Z  * [new tag]                 trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:44.2027847Z  * [new tag]                 v0.1.1                      -> v0.1.1
2025-12-04T08:57:44.2028595Z  * [new tag]                 v0.1.10                     -> v0.1.10
2025-12-04T08:57:44.2029555Z  * [new tag]                 v0.1.11                     -> v0.1.11
2025-12-04T08:57:44.2030493Z  * [new tag]                 v0.1.12                     -> v0.1.12
2025-12-04T08:57:44.2031782Z  * [new tag]                 v0.1.2                      -> v0.1.2
2025-12-04T08:57:44.2032761Z  * [new tag]                 v0.1.3                      -> v0.1.3
2025-12-04T08:57:44.2033457Z  * [new tag]                 v0.1.4                      -> v0.1.4
2025-12-04T08:57:44.2034342Z  * [new tag]                 v0.1.5                      -> v0.1.5
2025-12-04T08:57:44.2035174Z  * [new tag]                 v0.1.6                      -> v0.1.6
2025-12-04T08:57:44.2035899Z  * [new tag]                 v0.1.7                      -> v0.1.7
2025-12-04T08:57:44.2036659Z  * [new tag]                 v0.1.8                      -> v0.1.8
2025-12-04T08:57:44.2037515Z  * [new tag]                 v0.1.9                      -> v0.1.9
2025-12-04T08:57:44.2038374Z  * [new tag]                 v0.2.0                      -> v0.2.0
2025-12-04T08:57:44.2039275Z  * [new tag]                 v0.3.0                      -> v0.3.0
2025-12-04T08:57:44.2040215Z  * [new tag]                 v0.3.1                      -> v0.3.1
2025-12-04T08:57:44.2041045Z  * [new tag]                 v0.4.0                      -> v0.4.0
2025-12-04T08:57:44.2041794Z  * [new tag]                 v0.4.1                      -> v0.4.1
2025-12-04T08:57:44.2042669Z  * [new tag]                 v1.0.0                      -> v1.0.0
2025-12-04T08:57:44.2043561Z  * [new tag]                 v1.0.0a0                    -> v1.0.0a0
2025-12-04T08:57:44.2044331Z  * [new tag]                 v1.0.1                      -> v1.0.1
2025-12-04T08:57:44.2045233Z  * [new tag]                 v1.0rc0                     -> v1.0rc0
2025-12-04T08:57:44.2045947Z  * [new tag]                 v1.0rc1                     -> v1.0rc1
2025-12-04T08:57:44.2046717Z  * [new tag]                 v1.1.0                      -> v1.1.0
2025-12-04T08:57:44.2047615Z  * [new tag]                 v1.1.0a0                    -> v1.1.0a0
2025-12-04T08:57:44.2048692Z  * [new tag]                 v1.10.0                     -> v1.10.0
2025-12-04T08:57:44.2049627Z  * [new tag]                 v1.10.0-rc1                 -> v1.10.0-rc1
2025-12-04T08:57:44.2050495Z  * [new tag]                 v1.10.0-rc2                 -> v1.10.0-rc2
2025-12-04T08:57:44.2051268Z  * [new tag]                 v1.10.0-rc3                 -> v1.10.0-rc3
2025-12-04T08:57:44.2052251Z  * [new tag]                 v1.10.1                     -> v1.10.1
2025-12-04T08:57:44.2052951Z  * [new tag]                 v1.10.1-rc1                 -> v1.10.1-rc1
2025-12-04T08:57:44.2053561Z  * [new tag]                 v1.10.2                     -> v1.10.2
2025-12-04T08:57:44.2054223Z  * [new tag]                 v1.10.2-rc1                 -> v1.10.2-rc1
2025-12-04T08:57:44.2055158Z  * [new tag]                 v1.11.0                     -> v1.11.0
2025-12-04T08:57:44.2056090Z  * [new tag]                 v1.11.0-rc1                 -> v1.11.0-rc1
2025-12-04T08:57:44.2057411Z  * [new tag]                 v1.11.0-rc2                 -> v1.11.0-rc2
2025-12-04T08:57:44.2058512Z  * [new tag]                 v1.11.0-rc3                 -> v1.11.0-rc3
2025-12-04T08:57:44.2059457Z  * [new tag]                 v1.11.0-rc4                 -> v1.11.0-rc4
2025-12-04T08:57:44.2060404Z  * [new tag]                 v1.11.0-rc5                 -> v1.11.0-rc5
2025-12-04T08:57:44.2061074Z  * [new tag]                 v1.11.0-rc6                 -> v1.11.0-rc6
2025-12-04T08:57:44.2061728Z  * [new tag]                 v1.11.0-rc7                 -> v1.11.0-rc7
2025-12-04T08:57:44.2062833Z  * [new tag]                 v1.12.0                     -> v1.12.0
2025-12-04T08:57:44.2063722Z  * [new tag]                 v1.12.0-rc1                 -> v1.12.0-rc1
2025-12-04T08:57:44.2064719Z  * [new tag]                 v1.12.0-rc2                 -> v1.12.0-rc2
2025-12-04T08:57:44.2065614Z  * [new tag]                 v1.12.0-rc3                 -> v1.12.0-rc3
2025-12-04T08:57:44.2066491Z  * [new tag]                 v1.12.0-rc4                 -> v1.12.0-rc4
2025-12-04T08:57:44.2067419Z  * [new tag]                 v1.12.0-rc5                 -> v1.12.0-rc5
2025-12-04T08:57:44.2068475Z  * [new tag]                 v1.12.0-rc6                 -> v1.12.0-rc6
2025-12-04T08:57:44.2069242Z  * [new tag]                 v1.12.0-rc7                 -> v1.12.0-rc7
2025-12-04T08:57:44.2069926Z  * [new tag]                 v1.12.0-rc8                 -> v1.12.0-rc8
2025-12-04T08:57:44.2070590Z  * [new tag]                 v1.12.1                     -> v1.12.1
2025-12-04T08:57:44.2071624Z  * [new tag]                 v1.12.1-rc1                 -> v1.12.1-rc1
2025-12-04T08:57:44.2072540Z  * [new tag]                 v1.12.1-rc2                 -> v1.12.1-rc2
2025-12-04T08:57:44.2073517Z  * [new tag]                 v1.12.1-rc3                 -> v1.12.1-rc3
2025-12-04T08:57:44.2074434Z  * [new tag]                 v1.12.1-rc4                 -> v1.12.1-rc4
2025-12-04T08:57:44.2075018Z  * [new tag]                 v1.12.1-rc5                 -> v1.12.1-rc5
2025-12-04T08:57:44.2075998Z  * [new tag]                 v1.13.0                     -> v1.13.0
2025-12-04T08:57:44.2076865Z  * [new tag]                 v1.13.0-rc1                 -> v1.13.0-rc1
2025-12-04T08:57:44.2077737Z  * [new tag]                 v1.13.0-rc2                 -> v1.13.0-rc2
2025-12-04T08:57:44.2078581Z  * [new tag]                 v1.13.0-rc3                 -> v1.13.0-rc3
2025-12-04T08:57:44.2079607Z  * [new tag]                 v1.13.0-rc4                 -> v1.13.0-rc4
2025-12-04T08:57:44.2080261Z  * [new tag]                 v1.13.0-rc5                 -> v1.13.0-rc5
2025-12-04T08:57:44.2080921Z  * [new tag]                 v1.13.0-rc6                 -> v1.13.0-rc6
2025-12-04T08:57:44.2081908Z  * [new tag]                 v1.13.1                     -> v1.13.1
2025-12-04T08:57:44.2082594Z  * [new tag]                 v1.13.1-rc1                 -> v1.13.1-rc1
2025-12-04T08:57:44.2083455Z  * [new tag]                 v1.2.0                      -> v1.2.0
2025-12-04T08:57:44.2084308Z  * [new tag]                 v1.2.0a0                    -> v1.2.0a0
2025-12-04T08:57:44.2085146Z  * [new tag]                 v1.3.0                      -> v1.3.0
2025-12-04T08:57:44.2086098Z  * [new tag]                 v1.3.0a0                    -> v1.3.0a0
2025-12-04T08:57:44.2087194Z  * [new tag]                 v1.3.1                      -> v1.3.1
2025-12-04T08:57:44.2088026Z  * [new tag]                 v1.4.0                      -> v1.4.0
2025-12-04T08:57:44.2088882Z  * [new tag]                 v1.4.0a0                    -> v1.4.0a0
2025-12-04T08:57:44.2089524Z  * [new tag]                 v1.4.1                      -> v1.4.1
2025-12-04T08:57:44.2090544Z  * [new tag]                 v1.5.0                      -> v1.5.0
2025-12-04T08:57:44.2091463Z  * [new tag]                 v1.5.0-rc1                  -> v1.5.0-rc1
2025-12-04T08:57:44.2092334Z  * [new tag]                 v1.5.0-rc2                  -> v1.5.0-rc2
2025-12-04T08:57:44.2093315Z  * [new tag]                 v1.5.0-rc3                  -> v1.5.0-rc3
2025-12-04T08:57:44.2094034Z  * [new tag]                 v1.5.0-rc4                  -> v1.5.0-rc4
2025-12-04T08:57:44.2094713Z  * [new tag]                 v1.5.0-rc5                  -> v1.5.0-rc5
2025-12-04T08:57:44.2095716Z  * [new tag]                 v1.5.1                      -> v1.5.1
2025-12-04T08:57:44.2096459Z  * [new tag]                 v1.5.1-rc1                  -> v1.5.1-rc1
2025-12-04T08:57:44.2097393Z  * [new tag]                 v1.6.0                      -> v1.6.0
2025-12-04T08:57:44.2098419Z  * [new tag]                 v1.6.0-rc1                  -> v1.6.0-rc1
2025-12-04T08:57:44.2099436Z  * [new tag]                 v1.6.0-rc2                  -> v1.6.0-rc2
2025-12-04T08:57:44.2100364Z  * [new tag]                 v1.6.0-rc3                  -> v1.6.0-rc3
2025-12-04T08:57:44.2101328Z  * [new tag]                 v1.6.0-rc4                  -> v1.6.0-rc4
2025-12-04T08:57:44.2102200Z  * [new tag]                 v1.6.0-rc5                  -> v1.6.0-rc5
2025-12-04T08:57:44.2103141Z  * [new tag]                 v1.6.0-rc6                  -> v1.6.0-rc6
2025-12-04T08:57:44.2103831Z  * [new tag]                 v1.6.0-rc7                  -> v1.6.0-rc7
2025-12-04T08:57:44.2104759Z  * [new tag]                 v1.7.0                      -> v1.7.0
2025-12-04T08:57:44.2105683Z  * [new tag]                 v1.7.0-rc1                  -> v1.7.0-rc1
2025-12-04T08:57:44.2106756Z  * [new tag]                 v1.7.0-rc2                  -> v1.7.0-rc2
2025-12-04T08:57:44.2107683Z  * [new tag]                 v1.7.0-rc3                  -> v1.7.0-rc3
2025-12-04T08:57:44.2108298Z  * [new tag]                 v1.7.0-rc4                  -> v1.7.0-rc4
2025-12-04T08:57:44.2109384Z  * [new tag]                 v1.7.1                      -> v1.7.1
2025-12-04T08:57:44.2110433Z  * [new tag]                 v1.7.1-rc1                  -> v1.7.1-rc1
2025-12-04T08:57:44.2111374Z  * [new tag]                 v1.7.1-rc2                  -> v1.7.1-rc2
2025-12-04T08:57:44.2112010Z  * [new tag]                 v1.7.1-rc3                  -> v1.7.1-rc3
2025-12-04T08:57:44.2112973Z  * [new tag]                 v1.8.0                      -> v1.8.0
2025-12-04T08:57:44.2113659Z  * [new tag]                 v1.8.0-rc1                  -> v1.8.0-rc1
2025-12-04T08:57:44.2114662Z  * [new tag]                 v1.8.0-rc2                  -> v1.8.0-rc2
2025-12-04T08:57:44.2115587Z  * [new tag]                 v1.8.0-rc3                  -> v1.8.0-rc3
2025-12-04T08:57:44.2116361Z  * [new tag]                 v1.8.0-rc4                  -> v1.8.0-rc4
2025-12-04T08:57:44.2117055Z  * [new tag]                 v1.8.0-rc5                  -> v1.8.0-rc5
2025-12-04T08:57:44.2117748Z  * [new tag]                 v1.8.1                      -> v1.8.1
2025-12-04T08:57:44.2118696Z  * [new tag]                 v1.8.1-rc1                  -> v1.8.1-rc1
2025-12-04T08:57:44.2119381Z  * [new tag]                 v1.8.1-rc2                  -> v1.8.1-rc2
2025-12-04T08:57:44.2120007Z  * [new tag]                 v1.8.1-rc3                  -> v1.8.1-rc3
2025-12-04T08:57:44.2122066Z  * [new tag]                 v1.8.2                      -> v1.8.2
2025-12-04T08:57:44.2122667Z  * [new tag]                 v1.8.2-rc1                  -> v1.8.2-rc1
2025-12-04T08:57:44.2123601Z  * [new tag]                 v1.9.0                      -> v1.9.0
2025-12-04T08:57:44.2124530Z  * [new tag]                 v1.9.0-rc1                  -> v1.9.0-rc1
2025-12-04T08:57:44.2125496Z  * [new tag]                 v1.9.0-rc2                  -> v1.9.0-rc2
2025-12-04T08:57:44.2126474Z  * [new tag]                 v1.9.0-rc3                  -> v1.9.0-rc3
2025-12-04T08:57:44.2127120Z  * [new tag]                 v1.9.0-rc4                  -> v1.9.0-rc4
2025-12-04T08:57:44.2128089Z  * [new tag]                 v1.9.1                      -> v1.9.1
2025-12-04T08:57:44.2129249Z  * [new tag]                 v1.9.1-rc1                  -> v1.9.1-rc1
2025-12-04T08:57:44.2129852Z  * [new tag]                 v1.9.1-rc2                  -> v1.9.1-rc2
2025-12-04T08:57:44.2130881Z  * [new tag]                 v2.0.0                      -> v2.0.0
2025-12-04T08:57:44.2131780Z  * [new tag]                 v2.0.0-rc1                  -> v2.0.0-rc1
2025-12-04T08:57:44.2132715Z  * [new tag]                 v2.0.0-rc2                  -> v2.0.0-rc2
2025-12-04T08:57:44.2133799Z  * [new tag]                 v2.0.0-rc3                  -> v2.0.0-rc3
2025-12-04T08:57:44.2134681Z  * [new tag]                 v2.0.0-rc4                  -> v2.0.0-rc4
2025-12-04T08:57:44.2135604Z  * [new tag]                 v2.0.0-rc5                  -> v2.0.0-rc5
2025-12-04T08:57:44.2136254Z  * [new tag]                 v2.0.0-rc6                  -> v2.0.0-rc6
2025-12-04T08:57:44.2137615Z  * [new tag]                 v2.0.1                      -> v2.0.1
2025-12-04T08:57:44.2138644Z  * [new tag]                 v2.0.1-rc1                  -> v2.0.1-rc1
2025-12-04T08:57:44.2139182Z  * [new tag]                 v2.0.1-rc2                  -> v2.0.1-rc2
2025-12-04T08:57:44.2140075Z  * [new tag]                 v2.0.1-rc3                  -> v2.0.1-rc3
2025-12-04T08:57:44.2140821Z  * [new tag]                 v2.0.1-rc4                  -> v2.0.1-rc4
2025-12-04T08:57:44.2142344Z  * [new tag]                 v2.1.0                      -> v2.1.0
2025-12-04T08:57:44.2143271Z  * [new tag]                 v2.1.0-rc1                  -> v2.1.0-rc1
2025-12-04T08:57:44.2144290Z  * [new tag]                 v2.1.0-rc2                  -> v2.1.0-rc2
2025-12-04T08:57:44.2145743Z  * [new tag]                 v2.1.0-rc3                  -> v2.1.0-rc3
2025-12-04T08:57:44.2146761Z  * [new tag]                 v2.1.0-rc4                  -> v2.1.0-rc4
2025-12-04T08:57:44.2147785Z  * [new tag]                 v2.1.0-rc5                  -> v2.1.0-rc5
2025-12-04T08:57:44.2148409Z  * [new tag]                 v2.1.0-rc6                  -> v2.1.0-rc6
2025-12-04T08:57:44.2149479Z  * [new tag]                 v2.1.1                      -> v2.1.1
2025-12-04T08:57:44.2150555Z  * [new tag]                 v2.1.1-rc1                  -> v2.1.1-rc1
2025-12-04T08:57:44.2151505Z  * [new tag]                 v2.1.1-rc2                  -> v2.1.1-rc2
2025-12-04T08:57:44.2152511Z  * [new tag]                 v2.1.1-rc3                  -> v2.1.1-rc3
2025-12-04T08:57:44.2153433Z  * [new tag]                 v2.1.1-rc4                  -> v2.1.1-rc4
2025-12-04T08:57:44.2154303Z  * [new tag]                 v2.1.1-rc5                  -> v2.1.1-rc5
2025-12-04T08:57:44.2154892Z  * [new tag]                 v2.1.1-rc6                  -> v2.1.1-rc6
2025-12-04T08:57:44.2155805Z  * [new tag]                 v2.1.2                      -> v2.1.2
2025-12-04T08:57:44.2156780Z  * [new tag]                 v2.1.2-rc1                  -> v2.1.2-rc1
2025-12-04T08:57:44.2157734Z  * [new tag]                 v2.1.2-rc2                  -> v2.1.2-rc2
2025-12-04T08:57:44.2158340Z  * [new tag]                 v2.1.2-rc3                  -> v2.1.2-rc3
2025-12-04T08:57:44.2159330Z  * [new tag]                 v2.2.0                      -> v2.2.0
2025-12-04T08:57:44.2160215Z  * [new tag]                 v2.2.0-rc1                  -> v2.2.0-rc1
2025-12-04T08:57:44.2161098Z  * [new tag]                 v2.2.0-rc2                  -> v2.2.0-rc2
2025-12-04T08:57:44.2161961Z  * [new tag]                 v2.2.0-rc3                  -> v2.2.0-rc3
2025-12-04T08:57:44.2162798Z  * [new tag]                 v2.2.0-rc4                  -> v2.2.0-rc4
2025-12-04T08:57:44.2163648Z  * [new tag]                 v2.2.0-rc5                  -> v2.2.0-rc5
2025-12-04T08:57:44.2164522Z  * [new tag]                 v2.2.0-rc6                  -> v2.2.0-rc6
2025-12-04T08:57:44.2165182Z  * [new tag]                 v2.2.0-rc7                  -> v2.2.0-rc7
2025-12-04T08:57:44.2165824Z  * [new tag]                 v2.2.0-rc8                  -> v2.2.0-rc8
2025-12-04T08:57:44.2166812Z  * [new tag]                 v2.2.1                      -> v2.2.1
2025-12-04T08:57:44.2167763Z  * [new tag]                 v2.2.1-rc1                  -> v2.2.1-rc1
2025-12-04T08:57:44.2168375Z  * [new tag]                 v2.2.1-rc2                  -> v2.2.1-rc2
2025-12-04T08:57:44.2169080Z  * [new tag]                 v2.2.1-rc3                  -> v2.2.1-rc3
2025-12-04T08:57:44.2169733Z  * [new tag]                 v2.2.2                      -> v2.2.2
2025-12-04T08:57:44.2170846Z  * [new tag]                 v2.2.2-rc1                  -> v2.2.2-rc1
2025-12-04T08:57:44.2171442Z  * [new tag]                 v2.2.2-rc2                  -> v2.2.2-rc2
2025-12-04T08:57:44.2172312Z  * [new tag]                 v2.2.2-rc3                  -> v2.2.2-rc3
2025-12-04T08:57:44.2173295Z  * [new tag]                 v2.3.0                      -> v2.3.0
2025-12-04T08:57:44.2174167Z  * [new tag]                 v2.3.0-rc1                  -> v2.3.0-rc1
2025-12-04T08:57:44.2175238Z  * [new tag]                 v2.3.0-rc10                 -> v2.3.0-rc10
2025-12-04T08:57:44.2176058Z  * [new tag]                 v2.3.0-rc11                 -> v2.3.0-rc11
2025-12-04T08:57:44.2177047Z  * [new tag]                 v2.3.0-rc12                 -> v2.3.0-rc12
2025-12-04T08:57:44.2178115Z  * [new tag]                 v2.3.0-rc2                  -> v2.3.0-rc2
2025-12-04T08:57:44.2179091Z  * [new tag]                 v2.3.0-rc3                  -> v2.3.0-rc3
2025-12-04T08:57:44.2180027Z  * [new tag]                 v2.3.0-rc4                  -> v2.3.0-rc4
2025-12-04T08:57:44.2180898Z  * [new tag]                 v2.3.0-rc5                  -> v2.3.0-rc5
2025-12-04T08:57:44.2181586Z  * [new tag]                 v2.3.0-rc6                  -> v2.3.0-rc6
2025-12-04T08:57:44.2182541Z  * [new tag]                 v2.3.0-rc7                  -> v2.3.0-rc7
2025-12-04T08:57:44.2183505Z  * [new tag]                 v2.3.0-rc8                  -> v2.3.0-rc8
2025-12-04T08:57:44.2184180Z  * [new tag]                 v2.3.0-rc9                  -> v2.3.0-rc9
2025-12-04T08:57:44.2184827Z  * [new tag]                 v2.3.1                      -> v2.3.1
2025-12-04T08:57:44.2185832Z  * [new tag]                 v2.3.1-rc1                  -> v2.3.1-rc1
2025-12-04T08:57:44.2186788Z  * [new tag]                 v2.3.1-rc2                  -> v2.3.1-rc2
2025-12-04T08:57:44.2187729Z  * [new tag]                 v2.3.1-rc3                  -> v2.3.1-rc3
2025-12-04T08:57:44.2188786Z  * [new tag]                 v2.4.0                      -> v2.4.0
2025-12-04T08:57:44.2189760Z  * [new tag]                 v2.4.0-rc1                  -> v2.4.0-rc1
2025-12-04T08:57:44.2190631Z  * [new tag]                 v2.4.0-rc2                  -> v2.4.0-rc2
2025-12-04T08:57:44.2191512Z  * [new tag]                 v2.4.0-rc3                  -> v2.4.0-rc3
2025-12-04T08:57:44.2192410Z  * [new tag]                 v2.4.0-rc4                  -> v2.4.0-rc4
2025-12-04T08:57:44.2193380Z  * [new tag]                 v2.4.0-rc5                  -> v2.4.0-rc5
2025-12-04T08:57:44.2194288Z  * [new tag]                 v2.4.0-rc6                  -> v2.4.0-rc6
2025-12-04T08:57:44.2195220Z  * [new tag]                 v2.4.0-rc7                  -> v2.4.0-rc7
2025-12-04T08:57:44.2196067Z  * [new tag]                 v2.4.0-rc8                  -> v2.4.0-rc8
2025-12-04T08:57:44.2197042Z  * [new tag]                 v2.4.0-rc9                  -> v2.4.0-rc9
2025-12-04T08:57:44.2197695Z  * [new tag]                 v2.4.1                      -> v2.4.1
2025-12-04T08:57:44.2198711Z  * [new tag]                 v2.4.1-rc1                  -> v2.4.1-rc1
2025-12-04T08:57:44.2199665Z  * [new tag]                 v2.4.1-rc2                  -> v2.4.1-rc2
2025-12-04T08:57:44.2200579Z  * [new tag]                 v2.4.1-rc3                  -> v2.4.1-rc3
2025-12-04T08:57:44.2201584Z  * [new tag]                 v2.5.0                      -> v2.5.0
2025-12-04T08:57:44.2202880Z  * [new tag]                 v2.5.0-rc1                  -> v2.5.0-rc1
2025-12-04T08:57:44.2203543Z  * [new tag]                 v2.5.0-rc10                 -> v2.5.0-rc10
2025-12-04T08:57:44.2204502Z  * [new tag]                 v2.5.0-rc2                  -> v2.5.0-rc2
2025-12-04T08:57:44.2205349Z  * [new tag]                 v2.5.0-rc3                  -> v2.5.0-rc3
2025-12-04T08:57:44.2206262Z  * [new tag]                 v2.5.0-rc4                  -> v2.5.0-rc4
2025-12-04T08:57:44.2207141Z  * [new tag]                 v2.5.0-rc5                  -> v2.5.0-rc5
2025-12-04T08:57:44.2208170Z  * [new tag]                 v2.5.0-rc6                  -> v2.5.0-rc6
2025-12-04T08:57:44.2209073Z  * [new tag]                 v2.5.0-rc7                  -> v2.5.0-rc7
2025-12-04T08:57:44.2209976Z  * [new tag]                 v2.5.0-rc8                  -> v2.5.0-rc8
2025-12-04T08:57:44.2210934Z  * [new tag]                 v2.5.0-rc9                  -> v2.5.0-rc9
2025-12-04T08:57:44.2211512Z  * [new tag]                 v2.5.1                      -> v2.5.1
2025-12-04T08:57:44.2212304Z  * [new tag]                 v2.5.1-rc1                  -> v2.5.1-rc1
2025-12-04T08:57:44.2212899Z  * [new tag]                 v2.6.0                      -> v2.6.0
2025-12-04T08:57:44.2213933Z  * [new tag]                 v2.6.0-rc1                  -> v2.6.0-rc1
2025-12-04T08:57:44.2214927Z  * [new tag]                 v2.6.0-rc2                  -> v2.6.0-rc2
2025-12-04T08:57:44.2215834Z  * [new tag]                 v2.6.0-rc3                  -> v2.6.0-rc3
2025-12-04T08:57:44.2217005Z  * [new tag]                 v2.6.0-rc4                  -> v2.6.0-rc4
2025-12-04T08:57:44.2218224Z  * [new tag]                 v2.6.0-rc5                  -> v2.6.0-rc5
2025-12-04T08:57:44.2219276Z  * [new tag]                 v2.6.0-rc6                  -> v2.6.0-rc6
2025-12-04T08:57:44.2220266Z  * [new tag]                 v2.6.0-rc7                  -> v2.6.0-rc7
2025-12-04T08:57:44.2221539Z  * [new tag]                 v2.6.0-rc8                  -> v2.6.0-rc8
2025-12-04T08:57:44.2222560Z  * [new tag]                 v2.6.0-rc9                  -> v2.6.0-rc9
2025-12-04T08:57:44.2223759Z  * [new tag]                 v2.7.0                      -> v2.7.0
2025-12-04T08:57:44.2224698Z  * [new tag]                 v2.7.0-rc1                  -> v2.7.0-rc1
2025-12-04T08:57:44.2225363Z  * [new tag]                 v2.7.0-rc10                 -> v2.7.0-rc10
2025-12-04T08:57:44.2226470Z  * [new tag]                 v2.7.0-rc2                  -> v2.7.0-rc2
2025-12-04T08:57:44.2227474Z  * [new tag]                 v2.7.0-rc3                  -> v2.7.0-rc3
2025-12-04T08:57:44.2228419Z  * [new tag]                 v2.7.0-rc4                  -> v2.7.0-rc4
2025-12-04T08:57:44.2229360Z  * [new tag]                 v2.7.0-rc5                  -> v2.7.0-rc5
2025-12-04T08:57:44.2230253Z  * [new tag]                 v2.7.0-rc6                  -> v2.7.0-rc6
2025-12-04T08:57:44.2231219Z  * [new tag]                 v2.7.0-rc7                  -> v2.7.0-rc7
2025-12-04T08:57:44.2232397Z  * [new tag]                 v2.7.0-rc8                  -> v2.7.0-rc8
2025-12-04T08:57:44.2233570Z  * [new tag]                 v2.7.0-rc9                  -> v2.7.0-rc9
2025-12-04T08:57:44.2234242Z  * [new tag]                 v2.7.1                      -> v2.7.1
2025-12-04T08:57:44.2235300Z  * [new tag]                 v2.7.1-rc1                  -> v2.7.1-rc1
2025-12-04T08:57:44.2236263Z  * [new tag]                 v2.7.1-rc2                  -> v2.7.1-rc2
2025-12-04T08:57:44.2237258Z  * [new tag]                 v2.7.1-rc3                  -> v2.7.1-rc3
2025-12-04T08:57:44.2238225Z  * [new tag]                 v2.7.1-rc4                  -> v2.7.1-rc4
2025-12-04T08:57:44.2239172Z  * [new tag]                 v2.7.1-rc5                  -> v2.7.1-rc5
2025-12-04T08:57:44.2239832Z  * [new tag]                 v2.8.0                      -> v2.8.0
2025-12-04T08:57:44.2240813Z  * [new tag]                 v2.8.0-rc1                  -> v2.8.0-rc1
2025-12-04T08:57:44.2241719Z  * [new tag]                 v2.8.0-rc2                  -> v2.8.0-rc2
2025-12-04T08:57:44.2242698Z  * [new tag]                 v2.8.0-rc3                  -> v2.8.0-rc3
2025-12-04T08:57:44.2243766Z  * [new tag]                 v2.8.0-rc4                  -> v2.8.0-rc4
2025-12-04T08:57:44.2244705Z  * [new tag]                 v2.8.0-rc5                  -> v2.8.0-rc5
2025-12-04T08:57:44.2245659Z  * [new tag]                 v2.8.0-rc6                  -> v2.8.0-rc6
2025-12-04T08:57:44.2246606Z  * [new tag]                 v2.8.0-rc7                  -> v2.8.0-rc7
2025-12-04T08:57:44.2247505Z  * [new tag]                 v2.8.0-rc8                  -> v2.8.0-rc8
2025-12-04T08:57:44.2248501Z  * [new tag]                 v2.9.0                      -> v2.9.0
2025-12-04T08:57:44.2249461Z  * [new tag]                 v2.9.0-rc1                  -> v2.9.0-rc1
2025-12-04T08:57:44.2250419Z  * [new tag]                 v2.9.0-rc10                 -> v2.9.0-rc10
2025-12-04T08:57:44.2251393Z  * [new tag]                 v2.9.0-rc11                 -> v2.9.0-rc11
2025-12-04T08:57:44.2252710Z  * [new tag]                 v2.9.0-rc2                  -> v2.9.0-rc2
2025-12-04T08:57:44.2253664Z  * [new tag]                 v2.9.0-rc3                  -> v2.9.0-rc3
2025-12-04T08:57:44.2254481Z  * [new tag]                 v2.9.0-rc4                  -> v2.9.0-rc4
2025-12-04T08:57:44.2255443Z  * [new tag]                 v2.9.0-rc5                  -> v2.9.0-rc5
2025-12-04T08:57:44.2256912Z  * [new tag]                 v2.9.0-rc6                  -> v2.9.0-rc6
2025-12-04T08:57:44.2257972Z  * [new tag]                 v2.9.0-rc7                  -> v2.9.0-rc7
2025-12-04T08:57:44.2259164Z  * [new tag]                 v2.9.0-rc8                  -> v2.9.0-rc8
2025-12-04T08:57:44.2259864Z  * [new tag]                 v2.9.0-rc9                  -> v2.9.0-rc9
2025-12-04T08:57:44.2260571Z  * [new tag]                 v2.9.1                      -> v2.9.1
2025-12-04T08:57:44.2261638Z  * [new tag]                 v2.9.1-rc1                  -> v2.9.1-rc1
2025-12-04T08:57:44.2262605Z  * [new tag]                 v2.9.1-rc2                  -> v2.9.1-rc2
2025-12-04T08:57:44.2263978Z  * [new tag]                 viable/strict/1759343184    -> viable/strict/1759343184
2025-12-04T08:57:44.2264852Z  * [new tag]                 viable/strict/1759346540    -> viable/strict/1759346540
2025-12-04T08:57:44.2265700Z  * [new tag]                 viable/strict/1759348181    -> viable/strict/1759348181
2025-12-04T08:57:44.2266646Z  * [new tag]                 viable/strict/1759350324    -> viable/strict/1759350324
2025-12-04T08:57:44.2267476Z  * [new tag]                 viable/strict/1759351793    -> viable/strict/1759351793
2025-12-04T08:57:44.2268427Z  * [new tag]                 viable/strict/1759353844    -> viable/strict/1759353844
2025-12-04T08:57:44.2269317Z  * [new tag]                 viable/strict/1759355374    -> viable/strict/1759355374
2025-12-04T08:57:44.2270216Z  * [new tag]                 viable/strict/1759357472    -> viable/strict/1759357472
2025-12-04T08:57:44.2270991Z  * [new tag]                 viable/strict/1759361002    -> viable/strict/1759361002
2025-12-04T08:57:44.2272146Z  * [new tag]                 viable/strict/1759362585    -> viable/strict/1759362585
2025-12-04T08:57:44.2273192Z  * [new tag]                 viable/strict/1759365359    -> viable/strict/1759365359
2025-12-04T08:57:44.2274508Z  * [new tag]                 viable/strict/1759370089    -> viable/strict/1759370089
2025-12-04T08:57:44.2275439Z  * [new tag]                 viable/strict/1759377554    -> viable/strict/1759377554
2025-12-04T08:57:44.2276444Z  * [new tag]                 viable/strict/1759379133    -> viable/strict/1759379133
2025-12-04T08:57:44.2277360Z  * [new tag]                 viable/strict/1759389871    -> viable/strict/1759389871
2025-12-04T08:57:44.2278195Z  * [new tag]                 viable/strict/1759393562    -> viable/strict/1759393562
2025-12-04T08:57:44.2279131Z  * [new tag]                 viable/strict/1759395076    -> viable/strict/1759395076
2025-12-04T08:57:44.2280124Z  * [new tag]                 viable/strict/1759398579    -> viable/strict/1759398579
2025-12-04T08:57:44.2280950Z  * [new tag]                 viable/strict/1759404142    -> viable/strict/1759404142
2025-12-04T08:57:44.2281886Z  * [new tag]                 viable/strict/1759405773    -> viable/strict/1759405773
2025-12-04T08:57:44.2282706Z  * [new tag]                 viable/strict/1759408041    -> viable/strict/1759408041
2025-12-04T08:57:44.2283615Z  * [new tag]                 viable/strict/1759411593    -> viable/strict/1759411593
2025-12-04T08:57:44.2284425Z  * [new tag]                 viable/strict/1759427395    -> viable/strict/1759427395
2025-12-04T08:57:44.2285399Z  * [new tag]                 viable/strict/1759434582    -> viable/strict/1759434582
2025-12-04T08:57:44.2286322Z  * [new tag]                 viable/strict/1759436720    -> viable/strict/1759436720
2025-12-04T08:57:44.2287233Z  * [new tag]                 viable/strict/1759440219    -> viable/strict/1759440219
2025-12-04T08:57:44.2288044Z  * [new tag]                 viable/strict/1759441948    -> viable/strict/1759441948
2025-12-04T08:57:44.2289068Z  * [new tag]                 viable/strict/1759443860    -> viable/strict/1759443860
2025-12-04T08:57:44.2289789Z  * [new tag]                 viable/strict/1759445377    -> viable/strict/1759445377
2025-12-04T08:57:44.2290808Z  * [new tag]                 viable/strict/1759447415    -> viable/strict/1759447415
2025-12-04T08:57:44.2291766Z  * [new tag]                 viable/strict/1759451750    -> viable/strict/1759451750
2025-12-04T08:57:44.2292725Z  * [new tag]                 viable/strict/1759453910    -> viable/strict/1759453910
2025-12-04T08:57:44.2293548Z  * [new tag]                 viable/strict/1759456483    -> viable/strict/1759456483
2025-12-04T08:57:44.2294529Z  * [new tag]                 viable/strict/1759459279    -> viable/strict/1759459279
2025-12-04T08:57:44.2295449Z  * [new tag]                 viable/strict/1759460742    -> viable/strict/1759460742
2025-12-04T08:57:44.2296406Z  * [new tag]                 viable/strict/1759462025    -> viable/strict/1759462025
2025-12-04T08:57:44.2297663Z  * [new tag]                 viable/strict/1759469086    -> viable/strict/1759469086
2025-12-04T08:57:44.2298407Z  * [new tag]                 viable/strict/1759470581    -> viable/strict/1759470581
2025-12-04T08:57:44.2299388Z  * [new tag]                 viable/strict/1759472786    -> viable/strict/1759472786
2025-12-04T08:57:44.2300345Z  * [new tag]                 viable/strict/1759476294    -> viable/strict/1759476294
2025-12-04T08:57:44.2301171Z  * [new tag]                 viable/strict/1759479963    -> viable/strict/1759479963
2025-12-04T08:57:44.2302116Z  * [new tag]                 viable/strict/1759492177    -> viable/strict/1759492177
2025-12-04T08:57:44.2303049Z  * [new tag]                 viable/strict/1759519278    -> viable/strict/1759519278
2025-12-04T08:57:44.2303997Z  * [new tag]                 viable/strict/1759524580    -> viable/strict/1759524580
2025-12-04T08:57:44.2304802Z  * [new tag]                 viable/strict/1759528193    -> viable/strict/1759528193
2025-12-04T08:57:44.2306007Z  * [new tag]                 viable/strict/1759533797    -> viable/strict/1759533797
2025-12-04T08:57:44.2306858Z  * [new tag]                 viable/strict/1759542780    -> viable/strict/1759542780
2025-12-04T08:57:44.2307848Z  * [new tag]                 viable/strict/1759549779    -> viable/strict/1759549779
2025-12-04T08:57:44.2308895Z  * [new tag]                 viable/strict/1759555455    -> viable/strict/1759555455
2025-12-04T08:57:44.2309805Z  * [new tag]                 viable/strict/1759559176    -> viable/strict/1759559176
2025-12-04T08:57:44.2310744Z  * [new tag]                 viable/strict/1759560629    -> viable/strict/1759560629
2025-12-04T08:57:44.2311534Z  * [new tag]                 viable/strict/1759569848    -> viable/strict/1759569848
2025-12-04T08:57:44.2312646Z  * [new tag]                 viable/strict/1759571382    -> viable/strict/1759571382
2025-12-04T08:57:44.2313461Z  * [new tag]                 viable/strict/1759573474    -> viable/strict/1759573474
2025-12-04T08:57:44.2314408Z  * [new tag]                 viable/strict/1759618187    -> viable/strict/1759618187
2025-12-04T08:57:44.2315344Z  * [new tag]                 viable/strict/1759626742    -> viable/strict/1759626742
2025-12-04T08:57:44.2316168Z  * [new tag]                 viable/strict/1759632427    -> viable/strict/1759632427
2025-12-04T08:57:44.2317073Z  * [new tag]                 viable/strict/1759634971    -> viable/strict/1759634971
2025-12-04T08:57:44.2318010Z  * [new tag]                 viable/strict/1759661382    -> viable/strict/1759661382
2025-12-04T08:57:44.2318966Z  * [new tag]                 viable/strict/1759663294    -> viable/strict/1759663294
2025-12-04T08:57:44.2319657Z  * [new tag]                 viable/strict/1759708178    -> viable/strict/1759708178
2025-12-04T08:57:44.2320607Z  * [new tag]                 viable/strict/1759715695    -> viable/strict/1759715695
2025-12-04T08:57:44.2322217Z  * [new tag]                 viable/strict/1759728293    -> viable/strict/1759728293
2025-12-04T08:57:44.2322957Z  * [new tag]                 viable/strict/1759735513    -> viable/strict/1759735513
2025-12-04T08:57:44.2324016Z  * [new tag]                 viable/strict/1759739177    -> viable/strict/1759739177
2025-12-04T08:57:44.2324949Z  * [new tag]                 viable/strict/1759758635    -> viable/strict/1759758635
2025-12-04T08:57:44.2325887Z  * [new tag]                 viable/strict/1759765784    -> viable/strict/1759765784
2025-12-04T08:57:44.2326742Z  * [new tag]                 viable/strict/1759767948    -> viable/strict/1759767948
2025-12-04T08:57:44.2327736Z  * [new tag]                 viable/strict/1759771461    -> viable/strict/1759771461
2025-12-04T08:57:44.2328465Z  * [new tag]                 viable/strict/1759776706    -> viable/strict/1759776706
2025-12-04T08:57:44.2329556Z  * [new tag]                 viable/strict/1759782317    -> viable/strict/1759782317
2025-12-04T08:57:44.2330607Z  * [new tag]                 viable/strict/1759783777    -> viable/strict/1759783777
2025-12-04T08:57:44.2331582Z  * [new tag]                 viable/strict/1759785815    -> viable/strict/1759785815
2025-12-04T08:57:44.2332431Z  * [new tag]                 viable/strict/1759789459    -> viable/strict/1759789459
2025-12-04T08:57:44.2333483Z  * [new tag]                 viable/strict/1759790974    -> viable/strict/1759790974
2025-12-04T08:57:44.2334683Z  * [new tag]                 viable/strict/1759794583    -> viable/strict/1759794583
2025-12-04T08:57:44.2335620Z  * [new tag]                 viable/strict/1759797408    -> viable/strict/1759797408
2025-12-04T08:57:44.2336602Z  * [new tag]                 viable/strict/1759799518    -> viable/strict/1759799518
2025-12-04T08:57:44.2337761Z  * [new tag]                 viable/strict/1759804909    -> viable/strict/1759804909
2025-12-04T08:57:44.2338695Z  * [new tag]                 viable/strict/1759807643    -> viable/strict/1759807643
2025-12-04T08:57:44.2339626Z  * [new tag]                 viable/strict/1759809089    -> viable/strict/1759809089
2025-12-04T08:57:44.2340572Z  * [new tag]                 viable/strict/1759811145    -> viable/strict/1759811145
2025-12-04T08:57:44.2341514Z  * [new tag]                 viable/strict/1759812581    -> viable/strict/1759812581
2025-12-04T08:57:44.2342349Z  * [new tag]                 viable/strict/1759814683    -> viable/strict/1759814683
2025-12-04T08:57:44.2343330Z  * [new tag]                 viable/strict/1759821889    -> viable/strict/1759821889
2025-12-04T08:57:44.2344281Z  * [new tag]                 viable/strict/1759823376    -> viable/strict/1759823376
2025-12-04T08:57:44.2345252Z  * [new tag]                 viable/strict/1759827107    -> viable/strict/1759827107
2025-12-04T08:57:44.2346052Z  * [new tag]                 viable/strict/1759830577    -> viable/strict/1759830577
2025-12-04T08:57:44.2347113Z  * [new tag]                 viable/strict/1759832720    -> viable/strict/1759832720
2025-12-04T08:57:44.2347955Z  * [new tag]                 viable/strict/1759842063    -> viable/strict/1759842063
2025-12-04T08:57:44.2349034Z  * [new tag]                 viable/strict/1759847121    -> viable/strict/1759847121
2025-12-04T08:57:44.2350233Z  * [new tag]                 viable/strict/1759850721    -> viable/strict/1759850721
2025-12-04T08:57:44.2351045Z  * [new tag]                 viable/strict/1759857870    -> viable/strict/1759857870
2025-12-04T08:57:44.2352058Z  * [new tag]                 viable/strict/1759863143    -> viable/strict/1759863143
2025-12-04T08:57:44.2353025Z  * [new tag]                 viable/strict/1759875874    -> viable/strict/1759875874
2025-12-04T08:57:44.2353701Z  * [new tag]                 viable/strict/1759877385    -> viable/strict/1759877385
2025-12-04T08:57:44.2354644Z  * [new tag]                 viable/strict/1759883801    -> viable/strict/1759883801
2025-12-04T08:57:44.2355463Z  * [new tag]                 viable/strict/1759885922    -> viable/strict/1759885922
2025-12-04T08:57:44.2356480Z  * [new tag]                 viable/strict/1759888488    -> viable/strict/1759888488
2025-12-04T08:57:44.2357210Z  * [new tag]                 viable/strict/1759895471    -> viable/strict/1759895471
2025-12-04T08:57:44.2358092Z  * [new tag]                 viable/strict/1759904803    -> viable/strict/1759904803
2025-12-04T08:57:44.2359243Z  * [new tag]                 viable/strict/1759908300    -> viable/strict/1759908300
2025-12-04T08:57:44.2360299Z  * [new tag]                 viable/strict/1759915520    -> viable/strict/1759915520
2025-12-04T08:57:44.2361086Z  * [new tag]                 viable/strict/1759916978    -> viable/strict/1759916978
2025-12-04T08:57:44.2361828Z  * [new tag]                 viable/strict/1759930024    -> viable/strict/1759930024
2025-12-04T08:57:44.2362816Z  * [new tag]                 viable/strict/1759948122    -> viable/strict/1759948122
2025-12-04T08:57:44.2363723Z  * [new tag]                 viable/strict/1759952983    -> viable/strict/1759952983
2025-12-04T08:57:44.2364753Z  * [new tag]                 viable/strict/1759955121    -> viable/strict/1759955121
2025-12-04T08:57:44.2365526Z  * [new tag]                 viable/strict/1759962298    -> viable/strict/1759962298
2025-12-04T08:57:44.2366472Z  * [new tag]                 viable/strict/1759965837    -> viable/strict/1759965837
2025-12-04T08:57:44.2367291Z  * [new tag]                 viable/strict/1759970213    -> viable/strict/1759970213
2025-12-04T08:57:44.2368260Z  * [new tag]                 viable/strict/1759974894    -> viable/strict/1759974894
2025-12-04T08:57:44.2369069Z  * [new tag]                 viable/strict/1759977763    -> viable/strict/1759977763
2025-12-04T08:57:44.2370175Z  * [new tag]                 viable/strict/1759979241    -> viable/strict/1759979241
2025-12-04T08:57:44.2371073Z  * [new tag]                 viable/strict/1759985417    -> viable/strict/1759985417
2025-12-04T08:57:44.2371900Z  * [new tag]                 viable/strict/1759987490    -> viable/strict/1759987490
2025-12-04T08:57:44.2372851Z  * [new tag]                 viable/strict/1759996180    -> viable/strict/1759996180
2025-12-04T08:57:44.2373730Z  * [new tag]                 viable/strict/1760065682    -> viable/strict/1760065682
2025-12-04T08:57:44.2374652Z  * [new tag]                 viable/strict/1760066894    -> viable/strict/1760066894
2025-12-04T08:57:44.2375572Z  * [new tag]                 viable/strict/1760070345    -> viable/strict/1760070345
2025-12-04T08:57:44.2376454Z  * [new tag]                 viable/strict/1760089782    -> viable/strict/1760089782
2025-12-04T08:57:44.2377728Z  * [new tag]                 viable/strict/1760091921    -> viable/strict/1760091921
2025-12-04T08:57:44.2378689Z  * [new tag]                 viable/strict/1760127924    -> viable/strict/1760127924
2025-12-04T08:57:44.2379619Z  * [new tag]                 viable/strict/1760129489    -> viable/strict/1760129489
2025-12-04T08:57:44.2380611Z  * [new tag]                 viable/strict/1760132980    -> viable/strict/1760132980
2025-12-04T08:57:44.2381602Z  * [new tag]                 viable/strict/1760135060    -> viable/strict/1760135060
2025-12-04T08:57:44.2382623Z  * [new tag]                 viable/strict/1760215782    -> viable/strict/1760215782
2025-12-04T08:57:44.2383586Z  * [new tag]                 viable/strict/1760273849    -> viable/strict/1760273849
2025-12-04T08:57:44.2384625Z  * [new tag]                 viable/strict/1760275517    -> viable/strict/1760275517
2025-12-04T08:57:44.2385375Z  * [new tag]                 viable/strict/1760276979    -> viable/strict/1760276979
2025-12-04T08:57:44.2386331Z  * [new tag]                 viable/strict/1760279007    -> viable/strict/1760279007
2025-12-04T08:57:44.2387046Z  * [new tag]                 viable/strict/1760286328    -> viable/strict/1760286328
2025-12-04T08:57:44.2387812Z  * [new tag]                 viable/strict/1760493304    -> viable/strict/1760493304
2025-12-04T08:57:44.2388911Z  * [new tag]                 viable/strict/1760496298    -> viable/strict/1760496298
2025-12-04T08:57:44.2389938Z  * [new tag]                 viable/strict/1760518396    -> viable/strict/1760518396
2025-12-04T08:57:44.2390636Z  * [new tag]                 viable/strict/1760534864    -> viable/strict/1760534864
2025-12-04T08:57:44.2391612Z  * [new tag]                 viable/strict/1760549062    -> viable/strict/1760549062
2025-12-04T08:57:44.2392606Z  * [new tag]                 viable/strict/1760552799    -> viable/strict/1760552799
2025-12-04T08:57:44.2393535Z  * [new tag]                 viable/strict/1760554355    -> viable/strict/1760554355
2025-12-04T08:57:44.2394855Z  * [new tag]                 viable/strict/1760556275    -> viable/strict/1760556275
2025-12-04T08:57:44.2395741Z  * [new tag]                 viable/strict/1760564979    -> viable/strict/1760564979
2025-12-04T08:57:44.2396744Z  * [new tag]                 viable/strict/1760567049    -> viable/strict/1760567049
2025-12-04T08:57:44.2398047Z  * [new tag]                 viable/strict/1760568585    -> viable/strict/1760568585
2025-12-04T08:57:44.2398957Z  * [new tag]                 viable/strict/1760570630    -> viable/strict/1760570630
2025-12-04T08:57:44.2399927Z  * [new tag]                 viable/strict/1760572180    -> viable/strict/1760572180
2025-12-04T08:57:44.2400717Z  * [new tag]                 viable/strict/1760575094    -> viable/strict/1760575094
2025-12-04T08:57:44.2401804Z  * [new tag]                 viable/strict/1760579709    -> viable/strict/1760579709
2025-12-04T08:57:44.2403132Z  * [new tag]                 viable/strict/1760582614    -> viable/strict/1760582614
2025-12-04T08:57:44.2404073Z  * [new tag]                 viable/strict/1760586815    -> viable/strict/1760586815
2025-12-04T08:57:44.2404769Z  * [new tag]                 viable/strict/1760588829    -> viable/strict/1760588829
2025-12-04T08:57:44.2405714Z  * [new tag]                 viable/strict/1760590200    -> viable/strict/1760590200
2025-12-04T08:57:44.2406737Z  * [new tag]                 viable/strict/1760592311    -> viable/strict/1760592311
2025-12-04T08:57:44.2407560Z  * [new tag]                 viable/strict/1760619733    -> viable/strict/1760619733
2025-12-04T08:57:44.2408318Z  * [new tag]                 viable/strict/1760628335    -> viable/strict/1760628335
2025-12-04T08:57:44.2409221Z  * [new tag]                 viable/strict/1760635490    -> viable/strict/1760635490
2025-12-04T08:57:44.2410034Z  * [new tag]                 viable/strict/1760640743    -> viable/strict/1760640743
2025-12-04T08:57:44.2410974Z  * [new tag]                 viable/strict/1760642528    -> viable/strict/1760642528
2025-12-04T08:57:44.2411792Z  * [new tag]                 viable/strict/1760646330    -> viable/strict/1760646330
2025-12-04T08:57:44.2412819Z  * [new tag]                 viable/strict/1760666101    -> viable/strict/1760666101
2025-12-04T08:57:44.2413762Z  * [new tag]                 viable/strict/1760668990    -> viable/strict/1760668990
2025-12-04T08:57:44.2414584Z  * [new tag]                 viable/strict/1760670600    -> viable/strict/1760670600
2025-12-04T08:57:44.2415585Z  * [new tag]                 viable/strict/1760671704    -> viable/strict/1760671704
2025-12-04T08:57:44.2416598Z  * [new tag]                 viable/strict/1760673121    -> viable/strict/1760673121
2025-12-04T08:57:44.2417789Z  * [new tag]                 viable/strict/1760675352    -> viable/strict/1760675352
2025-12-04T08:57:44.2418722Z  * [new tag]                 viable/strict/1760696731    -> viable/strict/1760696731
2025-12-04T08:57:44.2421537Z  * [new tag]                 viable/strict/1760723515    -> viable/strict/1760723515
2025-12-04T08:57:44.2422460Z  * [new tag]                 viable/strict/1760727234    -> viable/strict/1760727234
2025-12-04T08:57:44.2423472Z  * [new tag]                 viable/strict/1760730578    -> viable/strict/1760730578
2025-12-04T08:57:44.2424395Z  * [new tag]                 viable/strict/1760732726    -> viable/strict/1760732726
2025-12-04T08:57:44.2425325Z  * [new tag]                 viable/strict/1760734180    -> viable/strict/1760734180
2025-12-04T08:57:44.2426466Z  * [new tag]                 viable/strict/1760736251    -> viable/strict/1760736251
2025-12-04T08:57:44.2427289Z  * [new tag]                 viable/strict/1760737772    -> viable/strict/1760737772
2025-12-04T08:57:44.2428244Z  * [new tag]                 viable/strict/1760758005    -> viable/strict/1760758005
2025-12-04T08:57:44.2429200Z  * [new tag]                 viable/strict/1760761532    -> viable/strict/1760761532
2025-12-04T08:57:44.2430151Z  * [new tag]                 viable/strict/1760802581    -> viable/strict/1760802581
2025-12-04T08:57:44.2430989Z  * [new tag]                 viable/strict/1760827772    -> viable/strict/1760827772
2025-12-04T08:57:44.2431950Z  * [new tag]                 viable/strict/1760834524    -> viable/strict/1760834524
2025-12-04T08:57:44.2433089Z  * [new tag]                 viable/strict/1760845009    -> viable/strict/1760845009
2025-12-04T08:57:44.2434005Z  * [new tag]                 viable/strict/1760876836    -> viable/strict/1760876836
2025-12-04T08:57:44.2434975Z  * [new tag]                 viable/strict/1760880329    -> viable/strict/1760880329
2025-12-04T08:57:44.2435708Z  * [new tag]                 viable/strict/1760888987    -> viable/strict/1760888987
2025-12-04T08:57:44.2436643Z  * [new tag]                 viable/strict/1760912664    -> viable/strict/1760912664
2025-12-04T08:57:44.2437472Z  * [new tag]                 viable/strict/1760925321    -> viable/strict/1760925321
2025-12-04T08:57:44.2438379Z  * [new tag]                 viable/strict/1760931488    -> viable/strict/1760931488
2025-12-04T08:57:44.2439288Z  * [new tag]                 viable/strict/1760932693    -> viable/strict/1760932693
2025-12-04T08:57:44.2440283Z  * [new tag]                 viable/strict/1761004184    -> viable/strict/1761004184
2025-12-04T08:57:44.2441217Z  * [new tag]                 viable/strict/1761014748    -> viable/strict/1761014748
2025-12-04T08:57:44.2442041Z  * [new tag]                 viable/strict/1761017491    -> viable/strict/1761017491
2025-12-04T08:57:44.2443019Z  * [new tag]                 viable/strict/1761018806    -> viable/strict/1761018806
2025-12-04T08:57:44.2444002Z  * [new tag]                 viable/strict/1761020754    -> viable/strict/1761020754
2025-12-04T08:57:44.2444978Z  * [new tag]                 viable/strict/1761024303    -> viable/strict/1761024303
2025-12-04T08:57:44.2445801Z  * [new tag]                 viable/strict/1761029582    -> viable/strict/1761029582
2025-12-04T08:57:44.2446733Z  * [new tag]                 viable/strict/1761031535    -> viable/strict/1761031535
2025-12-04T08:57:44.2447553Z  * [new tag]                 viable/strict/1761035196    -> viable/strict/1761035196
2025-12-04T08:57:44.2448604Z  * [new tag]                 viable/strict/1761045825    -> viable/strict/1761045825
2025-12-04T08:57:44.2449543Z  * [new tag]                 viable/strict/1761054796    -> viable/strict/1761054796
2025-12-04T08:57:44.2450451Z  * [new tag]                 viable/strict/1761060314    -> viable/strict/1761060314
2025-12-04T08:57:44.2451364Z  * [new tag]                 viable/strict/1761071198    -> viable/strict/1761071198
2025-12-04T08:57:44.2452339Z  * [new tag]                 viable/strict/1761074628    -> viable/strict/1761074628
2025-12-04T08:57:44.2453252Z  * [new tag]                 viable/strict/1761078351    -> viable/strict/1761078351
2025-12-04T08:57:44.2454173Z  * [new tag]                 viable/strict/1761079822    -> viable/strict/1761079822
2025-12-04T08:57:44.2454957Z  * [new tag]                 viable/strict/1761081873    -> viable/strict/1761081873
2025-12-04T08:57:44.2455938Z  * [new tag]                 viable/strict/1761083392    -> viable/strict/1761083392
2025-12-04T08:57:44.2457663Z  * [new tag]                 viable/strict/1761085465    -> viable/strict/1761085465
2025-12-04T08:57:44.2458624Z  * [new tag]                 viable/strict/1761089099    -> viable/strict/1761089099
2025-12-04T08:57:44.2459448Z  * [new tag]                 viable/strict/1761095535    -> viable/strict/1761095535
2025-12-04T08:57:44.2460487Z  * [new tag]                 viable/strict/1761098119    -> viable/strict/1761098119
2025-12-04T08:57:44.2461927Z  * [new tag]                 viable/strict/1761101330    -> viable/strict/1761101330
2025-12-04T08:57:44.2462879Z  * [new tag]                 viable/strict/1761114425    -> viable/strict/1761114425
2025-12-04T08:57:44.2463793Z  * [new tag]                 viable/strict/1761116036    -> viable/strict/1761116036
2025-12-04T08:57:44.2464756Z  * [new tag]                 viable/strict/1761119379    -> viable/strict/1761119379
2025-12-04T08:57:44.2465692Z  * [new tag]                 viable/strict/1761121601    -> viable/strict/1761121601
2025-12-04T08:57:44.2466539Z  * [new tag]                 viable/strict/1761123234    -> viable/strict/1761123234
2025-12-04T08:57:44.2467466Z  * [new tag]                 viable/strict/1761126621    -> viable/strict/1761126621
2025-12-04T08:57:44.2468445Z  * [new tag]                 viable/strict/1761132259    -> viable/strict/1761132259
2025-12-04T08:57:44.2469385Z  * [new tag]                 viable/strict/1761146746    -> viable/strict/1761146746
2025-12-04T08:57:44.2470320Z  * [new tag]                 viable/strict/1761164752    -> viable/strict/1761164752
2025-12-04T08:57:44.2471242Z  * [new tag]                 viable/strict/1761166198    -> viable/strict/1761166198
2025-12-04T08:57:44.2472147Z  * [new tag]                 viable/strict/1761175424    -> viable/strict/1761175424
2025-12-04T08:57:44.2473072Z  * [new tag]                 viable/strict/1761176983    -> viable/strict/1761176983
2025-12-04T08:57:44.2474179Z  * [new tag]                 viable/strict/1761179891    -> viable/strict/1761179891
2025-12-04T08:57:44.2475070Z  * [new tag]                 viable/strict/1761181930    -> viable/strict/1761181930
2025-12-04T08:57:44.2476076Z  * [new tag]                 viable/strict/1761184516    -> viable/strict/1761184516
2025-12-04T08:57:44.2477039Z  * [new tag]                 viable/strict/1761190179    -> viable/strict/1761190179
2025-12-04T08:57:44.2477869Z  * [new tag]                 viable/strict/1761193558    -> viable/strict/1761193558
2025-12-04T08:57:44.2478816Z  * [new tag]                 viable/strict/1761207990    -> viable/strict/1761207990
2025-12-04T08:57:44.2479756Z  * [new tag]                 viable/strict/1761229539    -> viable/strict/1761229539
2025-12-04T08:57:44.2480922Z  * [new tag]                 viable/strict/1761244031    -> viable/strict/1761244031
2025-12-04T08:57:44.2481844Z  * [new tag]                 viable/strict/1761248986    -> viable/strict/1761248986
2025-12-04T08:57:44.2482735Z  * [new tag]                 viable/strict/1761259791    -> viable/strict/1761259791
2025-12-04T08:57:44.2483566Z  * [new tag]                 viable/strict/1761266139    -> viable/strict/1761266139
2025-12-04T08:57:44.2484606Z  * [new tag]                 viable/strict/1761268316    -> viable/strict/1761268316
2025-12-04T08:57:44.2485410Z  * [new tag]                 viable/strict/1761273805    -> viable/strict/1761273805
2025-12-04T08:57:44.2486340Z  * [new tag]                 viable/strict/1761275261    -> viable/strict/1761275261
2025-12-04T08:57:44.2487300Z  * [new tag]                 viable/strict/1761277913    -> viable/strict/1761277913
2025-12-04T08:57:44.2488276Z  * [new tag]                 viable/strict/1761290701    -> viable/strict/1761290701
2025-12-04T08:57:44.2489228Z  * [new tag]                 viable/strict/1761294396    -> viable/strict/1761294396
2025-12-04T08:57:44.2490132Z  * [new tag]                 viable/strict/1761303047    -> viable/strict/1761303047
2025-12-04T08:57:44.2491051Z  * [new tag]                 viable/strict/1761335388    -> viable/strict/1761335388
2025-12-04T08:57:44.2491950Z  * [new tag]                 viable/strict/1761337551    -> viable/strict/1761337551
2025-12-04T08:57:44.2492855Z  * [new tag]                 viable/strict/1761339007    -> viable/strict/1761339007
2025-12-04T08:57:44.2493662Z  * [new tag]                 viable/strict/1761341050    -> viable/strict/1761341050
2025-12-04T08:57:44.2494696Z  * [new tag]                 viable/strict/1761346188    -> viable/strict/1761346188
2025-12-04T08:57:44.2495782Z  * [new tag]                 viable/strict/1761349792    -> viable/strict/1761349792
2025-12-04T08:57:44.2496930Z  * [new tag]                 viable/strict/1761352620    -> viable/strict/1761352620
2025-12-04T08:57:44.2497899Z  * [new tag]                 viable/strict/1761354730    -> viable/strict/1761354730
2025-12-04T08:57:44.2498869Z  * [new tag]                 viable/strict/1761357298    -> viable/strict/1761357298
2025-12-04T08:57:44.2499787Z  * [new tag]                 viable/strict/1761360201    -> viable/strict/1761360201
2025-12-04T08:57:44.2500783Z  * [new tag]                 viable/strict/1761361753    -> viable/strict/1761361753
2025-12-04T08:57:44.2501734Z  * [new tag]                 viable/strict/1761364351    -> viable/strict/1761364351
2025-12-04T08:57:44.2502570Z  * [new tag]                 viable/strict/1761366338    -> viable/strict/1761366338
2025-12-04T08:57:44.2503703Z  * [new tag]                 viable/strict/1761367802    -> viable/strict/1761367802
2025-12-04T08:57:44.2504657Z  * [new tag]                 viable/strict/1761369889    -> viable/strict/1761369889
2025-12-04T08:57:44.2505628Z  * [new tag]                 viable/strict/1761371385    -> viable/strict/1761371385
2025-12-04T08:57:44.2506688Z  * [new tag]                 viable/strict/1761373581    -> viable/strict/1761373581
2025-12-04T08:57:44.2507781Z  * [new tag]                 viable/strict/1761375054    -> viable/strict/1761375054
2025-12-04T08:57:44.2508914Z  * [new tag]                 viable/strict/1761421785    -> viable/strict/1761421785
2025-12-04T08:57:44.2509911Z  * [new tag]                 viable/strict/1761434614    -> viable/strict/1761434614
2025-12-04T08:57:44.2511138Z  * [new tag]                 viable/strict/1761439254    -> viable/strict/1761439254
2025-12-04T08:57:44.2512176Z  * [new tag]                 viable/strict/1761454187    -> viable/strict/1761454187
2025-12-04T08:57:44.2513126Z  * [new tag]                 viable/strict/1761459991    -> viable/strict/1761459991
2025-12-04T08:57:44.2514186Z  * [new tag]                 viable/strict/1761470668    -> viable/strict/1761470668
2025-12-04T08:57:44.2515534Z  * [new tag]                 viable/strict/1761472188    -> viable/strict/1761472188
2025-12-04T08:57:44.2516497Z  * [new tag]                 viable/strict/1761503178    -> viable/strict/1761503178
2025-12-04T08:57:44.2517423Z  * [new tag]                 viable/strict/1761517492    -> viable/strict/1761517492
2025-12-04T08:57:44.2518324Z  * [new tag]                 viable/strict/1761518981    -> viable/strict/1761518981
2025-12-04T08:57:44.2519333Z  * [new tag]                 viable/strict/1761533609    -> viable/strict/1761533609
2025-12-04T08:57:44.2520540Z  * [new tag]                 viable/strict/1761546438    -> viable/strict/1761546438
2025-12-04T08:57:44.2522109Z  * [new tag]                 viable/strict/1761548133    -> viable/strict/1761548133
2025-12-04T08:57:44.2523243Z  * [new tag]                 viable/strict/1761555186    -> viable/strict/1761555186
2025-12-04T08:57:44.2524429Z  * [new tag]                 viable/strict/1761557178    -> viable/strict/1761557178
2025-12-04T08:57:44.2525439Z  * [new tag]                 viable/strict/1761560772    -> viable/strict/1761560772
2025-12-04T08:57:44.2526356Z  * [new tag]                 viable/strict/1761562266    -> viable/strict/1761562266
2025-12-04T08:57:44.2527365Z  * [new tag]                 viable/strict/1761564260    -> viable/strict/1761564260
2025-12-04T08:57:44.2528315Z  * [new tag]                 viable/strict/1761568072    -> viable/strict/1761568072
2025-12-04T08:57:44.2529249Z  * [new tag]                 viable/strict/1761571683    -> viable/strict/1761571683
2025-12-04T08:57:44.2530012Z  * [new tag]                 viable/strict/1761580199    -> viable/strict/1761580199
2025-12-04T08:57:44.2531049Z  * [new tag]                 viable/strict/1761587383    -> viable/strict/1761587383
2025-12-04T08:57:44.2532186Z  * [new tag]                 viable/strict/1761591165    -> viable/strict/1761591165
2025-12-04T08:57:44.2532933Z  * [new tag]                 viable/strict/1761594575    -> viable/strict/1761594575
2025-12-04T08:57:44.2534024Z  * [new tag]                 viable/strict/1761596710    -> viable/strict/1761596710
2025-12-04T08:57:44.2534944Z  * [new tag]                 viable/strict/1761598189    -> viable/strict/1761598189
2025-12-04T08:57:44.2535870Z  * [new tag]                 viable/strict/1761600254    -> viable/strict/1761600254
2025-12-04T08:57:44.2537088Z  * [new tag]                 viable/strict/1761603879    -> viable/strict/1761603879
2025-12-04T08:57:44.2538082Z  * [new tag]                 viable/strict/1761605429    -> viable/strict/1761605429
2025-12-04T08:57:44.2539138Z  * [new tag]                 viable/strict/1761607468    -> viable/strict/1761607468
2025-12-04T08:57:44.2540164Z  * [new tag]                 viable/strict/1761608983    -> viable/strict/1761608983
2025-12-04T08:57:44.2541184Z  * [new tag]                 viable/strict/1761611846    -> viable/strict/1761611846
2025-12-04T08:57:44.2542176Z  * [new tag]                 viable/strict/1761613922    -> viable/strict/1761613922
2025-12-04T08:57:44.2542935Z  * [new tag]                 viable/strict/1761616504    -> viable/strict/1761616504
2025-12-04T08:57:44.2543754Z  * [new tag]                 viable/strict/1761619599    -> viable/strict/1761619599
2025-12-04T08:57:44.2544745Z  * [new tag]                 viable/strict/1761686693    -> viable/strict/1761686693
2025-12-04T08:57:44.2545691Z  * [new tag]                 viable/strict/1761688179    -> viable/strict/1761688179
2025-12-04T08:57:44.2546545Z  * [new tag]                 viable/strict/1761691973    -> viable/strict/1761691973
2025-12-04T08:57:44.2547695Z  * [new tag]                 viable/strict/1761693884    -> viable/strict/1761693884
2025-12-04T08:57:44.2548649Z  * [new tag]                 viable/strict/1761695389    -> viable/strict/1761695389
2025-12-04T08:57:44.2549713Z  * [new tag]                 viable/strict/1761698408    -> viable/strict/1761698408
2025-12-04T08:57:44.2550662Z  * [new tag]                 viable/strict/1761702931    -> viable/strict/1761702931
2025-12-04T08:57:44.2551570Z  * [new tag]                 viable/strict/1761706307    -> viable/strict/1761706307
2025-12-04T08:57:44.2552497Z  * [new tag]                 viable/strict/1761709065    -> viable/strict/1761709065
2025-12-04T08:57:44.2553511Z  * [new tag]                 viable/strict/1761710285    -> viable/strict/1761710285
2025-12-04T08:57:44.2554510Z  * [new tag]                 viable/strict/1761711983    -> viable/strict/1761711983
2025-12-04T08:57:44.2555490Z  * [new tag]                 viable/strict/1761713514    -> viable/strict/1761713514
2025-12-04T08:57:44.2556511Z  * [new tag]                 viable/strict/1761715523    -> viable/strict/1761715523
2025-12-04T08:57:44.2557565Z  * [new tag]                 viable/strict/1761727973    -> viable/strict/1761727973
2025-12-04T08:57:44.2558544Z  * [new tag]                 viable/strict/1761751558    -> viable/strict/1761751558
2025-12-04T08:57:44.2559530Z  * [new tag]                 viable/strict/1761755187    -> viable/strict/1761755187
2025-12-04T08:57:44.2560512Z  * [new tag]                 viable/strict/1761756826    -> viable/strict/1761756826
2025-12-04T08:57:44.2561519Z  * [new tag]                 viable/strict/1761769551    -> viable/strict/1761769551
2025-12-04T08:57:44.2562555Z  * [new tag]                 viable/strict/1761771032    -> viable/strict/1761771032
2025-12-04T08:57:44.2563297Z  * [new tag]                 viable/strict/1761773101    -> viable/strict/1761773101
2025-12-04T08:57:44.2564331Z  * [new tag]                 viable/strict/1761781792    -> viable/strict/1761781792
2025-12-04T08:57:44.2565303Z  * [new tag]                 viable/strict/1761784788    -> viable/strict/1761784788
2025-12-04T08:57:44.2566249Z  * [new tag]                 viable/strict/1761786740    -> viable/strict/1761786740
2025-12-04T08:57:44.2567336Z  * [new tag]                 viable/strict/1761789332    -> viable/strict/1761789332
2025-12-04T08:57:44.2568701Z  * [new tag]                 viable/strict/1761792569    -> viable/strict/1761792569
2025-12-04T08:57:44.2569674Z  * [new tag]                 viable/strict/1761795289    -> viable/strict/1761795289
2025-12-04T08:57:44.2570605Z  * [new tag]                 viable/strict/1761798345    -> viable/strict/1761798345
2025-12-04T08:57:44.2571677Z  * [new tag]                 viable/strict/1761799827    -> viable/strict/1761799827
2025-12-04T08:57:44.2572658Z  * [new tag]                 viable/strict/1761805604    -> viable/strict/1761805604
2025-12-04T08:57:44.2573587Z  * [new tag]                 viable/strict/1761807202    -> viable/strict/1761807202
2025-12-04T08:57:44.2574549Z  * [new tag]                 viable/strict/1761809094    -> viable/strict/1761809094
2025-12-04T08:57:44.2575500Z  * [new tag]                 viable/strict/1761810576    -> viable/strict/1761810576
2025-12-04T08:57:44.2576609Z  * [new tag]                 viable/strict/1761812771    -> viable/strict/1761812771
2025-12-04T08:57:44.2577838Z  * [new tag]                 viable/strict/1761814363    -> viable/strict/1761814363
2025-12-04T08:57:44.2578795Z  * [new tag]                 viable/strict/1761857410    -> viable/strict/1761857410
2025-12-04T08:57:44.2579810Z  * [new tag]                 viable/strict/1761860985    -> viable/strict/1761860985
2025-12-04T08:57:44.2580775Z  * [new tag]                 viable/strict/1761863094    -> viable/strict/1761863094
2025-12-04T08:57:44.2581736Z  * [new tag]                 viable/strict/1761864590    -> viable/strict/1761864590
2025-12-04T08:57:44.2582706Z  * [new tag]                 viable/strict/1761866675    -> viable/strict/1761866675
2025-12-04T08:57:44.2583956Z  * [new tag]                 viable/strict/1761868178    -> viable/strict/1761868178
2025-12-04T08:57:44.2585382Z  * [new tag]                 viable/strict/1761871111    -> viable/strict/1761871111
2025-12-04T08:57:44.2586395Z  * [new tag]                 viable/strict/1761873126    -> viable/strict/1761873126
2025-12-04T08:57:44.2587416Z  * [new tag]                 viable/strict/1761875714    -> viable/strict/1761875714
2025-12-04T08:57:44.2588439Z  * [new tag]                 viable/strict/1761878924    -> viable/strict/1761878924
2025-12-04T08:57:44.2589590Z  * [new tag]                 viable/strict/1761881727    -> viable/strict/1761881727
2025-12-04T08:57:44.2590540Z  * [new tag]                 viable/strict/1761882959    -> viable/strict/1761882959
2025-12-04T08:57:44.2591501Z  * [new tag]                 viable/strict/1761886268    -> viable/strict/1761886268
2025-12-04T08:57:44.2592460Z  * [new tag]                 viable/strict/1761893641    -> viable/strict/1761893641
2025-12-04T08:57:44.2593434Z  * [new tag]                 viable/strict/1761931517    -> viable/strict/1761931517
2025-12-04T08:57:44.2594377Z  * [new tag]                 viable/strict/1761933080    -> viable/strict/1761933080
2025-12-04T08:57:44.2595351Z  * [new tag]                 viable/strict/1761935217    -> viable/strict/1761935217
2025-12-04T08:57:44.2596351Z  * [new tag]                 viable/strict/1761938533    -> viable/strict/1761938533
2025-12-04T08:57:44.2597322Z  * [new tag]                 viable/strict/1761940184    -> viable/strict/1761940184
2025-12-04T08:57:44.2598288Z  * [new tag]                 viable/strict/1761942338    -> viable/strict/1761942338
2025-12-04T08:57:44.2599199Z  * [new tag]                 viable/strict/1761946100    -> viable/strict/1761946100
2025-12-04T08:57:44.2600195Z  * [new tag]                 viable/strict/1761947374    -> viable/strict/1761947374
2025-12-04T08:57:44.2601152Z  * [new tag]                 viable/strict/1761950978    -> viable/strict/1761950978
2025-12-04T08:57:44.2602087Z  * [new tag]                 viable/strict/1761957727    -> viable/strict/1761957727
2025-12-04T08:57:44.2603048Z  * [new tag]                 viable/strict/1761959532    -> viable/strict/1761959532
2025-12-04T08:57:44.2604190Z  * [new tag]                 viable/strict/1761965366    -> viable/strict/1761965366
2025-12-04T08:57:44.2605262Z  * [new tag]                 viable/strict/1761968066    -> viable/strict/1761968066
2025-12-04T08:57:44.2606189Z  * [new tag]                 viable/strict/1761969322    -> viable/strict/1761969322
2025-12-04T08:57:44.2607196Z  * [new tag]                 viable/strict/1761974723    -> viable/strict/1761974723
2025-12-04T08:57:44.2608247Z  * [new tag]                 viable/strict/1761981837    -> viable/strict/1761981837
2025-12-04T08:57:44.2609274Z  * [new tag]                 viable/strict/1761985546    -> viable/strict/1761985546
2025-12-04T08:57:44.2610241Z  * [new tag]                 viable/strict/1761987030    -> viable/strict/1761987030
2025-12-04T08:57:44.2611231Z  * [new tag]                 viable/strict/1762003554    -> viable/strict/1762003554
2025-12-04T08:57:44.2612194Z  * [new tag]                 viable/strict/1762021560    -> viable/strict/1762021560
2025-12-04T08:57:44.2613141Z  * [new tag]                 viable/strict/1762032190    -> viable/strict/1762032190
2025-12-04T08:57:44.2614152Z  * [new tag]                 viable/strict/1762040981    -> viable/strict/1762040981
2025-12-04T08:57:44.2615143Z  * [new tag]                 viable/strict/1762048525    -> viable/strict/1762048525
2025-12-04T08:57:44.2616130Z  * [new tag]                 viable/strict/1762104223    -> viable/strict/1762104223
2025-12-04T08:57:44.2617432Z  * [new tag]                 viable/strict/1762105778    -> viable/strict/1762105778
2025-12-04T08:57:44.2618408Z  * [new tag]                 viable/strict/1762115109    -> viable/strict/1762115109
2025-12-04T08:57:44.2619370Z  * [new tag]                 viable/strict/1762125840    -> viable/strict/1762125840
2025-12-04T08:57:44.2620154Z  * [new tag]                 viable/strict/1762127377    -> viable/strict/1762127377
2025-12-04T08:57:44.2625413Z  * [new tag]                 viable/strict/1762134925    -> viable/strict/1762134925
2025-12-04T08:57:44.2626275Z  * [new tag]                 viable/strict/1762138338    -> viable/strict/1762138338
2025-12-04T08:57:44.2627387Z  * [new tag]                 viable/strict/1762148993    -> viable/strict/1762148993
2025-12-04T08:57:44.2628583Z  * [new tag]                 viable/strict/1762152871    -> viable/strict/1762152871
2025-12-04T08:57:44.2629607Z  * [new tag]                 viable/strict/1762156183    -> viable/strict/1762156183
2025-12-04T08:57:44.2630591Z  * [new tag]                 viable/strict/1762163457    -> viable/strict/1762163457
2025-12-04T08:57:44.2631635Z  * [new tag]                 viable/strict/1762165569    -> viable/strict/1762165569
2025-12-04T08:57:44.2632587Z  * [new tag]                 viable/strict/1762169035    -> viable/strict/1762169035
2025-12-04T08:57:44.2633667Z  * [new tag]                 viable/strict/1762174936    -> viable/strict/1762174936
2025-12-04T08:57:44.2634633Z  * [new tag]                 viable/strict/1762194412    -> viable/strict/1762194412
2025-12-04T08:57:44.2635602Z  * [new tag]                 viable/strict/1762195876    -> viable/strict/1762195876
2025-12-04T08:57:44.2636521Z  * [new tag]                 viable/strict/1762197788    -> viable/strict/1762197788
2025-12-04T08:57:44.2637545Z  * [new tag]                 viable/strict/1762199389    -> viable/strict/1762199389
2025-12-04T08:57:44.2638718Z  * [new tag]                 viable/strict/1762206585    -> viable/strict/1762206585
2025-12-04T08:57:44.2639788Z  * [new tag]                 viable/strict/1762210184    -> viable/strict/1762210184
2025-12-04T08:57:44.2640735Z  * [new tag]                 viable/strict/1762218736    -> viable/strict/1762218736
2025-12-04T08:57:44.2641714Z  * [new tag]                 viable/strict/1762224529    -> viable/strict/1762224529
2025-12-04T08:57:44.2642748Z  * [new tag]                 viable/strict/1762227253    -> viable/strict/1762227253
2025-12-04T08:57:44.2643501Z  * [new tag]                 viable/strict/1762228515    -> viable/strict/1762228515
2025-12-04T08:57:44.2644793Z  * [new tag]                 viable/strict/1762230349    -> viable/strict/1762230349
2025-12-04T08:57:44.2645554Z  * [new tag]                 viable/strict/1762231859    -> viable/strict/1762231859
2025-12-04T08:57:44.2646607Z  * [new tag]                 viable/strict/1762233925    -> viable/strict/1762233925
2025-12-04T08:57:44.2647697Z  * [new tag]                 viable/strict/1762237630    -> viable/strict/1762237630
2025-12-04T08:57:44.2648449Z  * [new tag]                 viable/strict/1762253522    -> viable/strict/1762253522
2025-12-04T08:57:44.2649657Z  * [new tag]                 viable/strict/1762278588    -> viable/strict/1762278588
2025-12-04T08:57:44.2650616Z  * [new tag]                 viable/strict/1762284203    -> viable/strict/1762284203
2025-12-04T08:57:44.2651604Z  * [new tag]                 viable/strict/1762289446    -> viable/strict/1762289446
2025-12-04T08:57:44.2652583Z  * [new tag]                 viable/strict/1762291515    -> viable/strict/1762291515
2025-12-04T08:57:44.2653979Z  * [new tag]                 viable/strict/1762295100    -> viable/strict/1762295100
2025-12-04T08:57:44.2654748Z  * [new tag]                 viable/strict/1762296590    -> viable/strict/1762296590
2025-12-04T08:57:44.2655570Z  * [new tag]                 viable/strict/1762300179    -> viable/strict/1762300179
2025-12-04T08:57:44.2656564Z  * [new tag]                 viable/strict/1762303207    -> viable/strict/1762303207
2025-12-04T08:57:44.2657863Z  * [new tag]                 viable/strict/1762386584    -> viable/strict/1762386584
2025-12-04T08:57:44.2658825Z  * [new tag]                 viable/strict/1762391537    -> viable/strict/1762391537
2025-12-04T08:57:44.2659620Z  * [new tag]                 viable/strict/1762394119    -> viable/strict/1762394119
2025-12-04T08:57:44.2661109Z  * [new tag]                 viable/strict/1762397437    -> viable/strict/1762397437
2025-12-04T08:57:44.2662122Z  * [new tag]                 viable/strict/1762400256    -> viable/strict/1762400256
2025-12-04T08:57:44.2663105Z  * [new tag]                 viable/strict/1762401469    -> viable/strict/1762401469
2025-12-04T08:57:44.2664114Z  * [new tag]                 viable/strict/1762408195    -> viable/strict/1762408195
2025-12-04T08:57:44.2665196Z  * [new tag]                 viable/strict/1762410411    -> viable/strict/1762410411
2025-12-04T08:57:44.2666218Z  * [new tag]                 viable/strict/1762417613    -> viable/strict/1762417613
2025-12-04T08:57:44.2667220Z  * [new tag]                 viable/strict/1762419198    -> viable/strict/1762419198
2025-12-04T08:57:44.2668221Z  * [new tag]                 viable/strict/1762422656    -> viable/strict/1762422656
2025-12-04T08:57:44.2669726Z  * [new tag]                 viable/strict/1762424746    -> viable/strict/1762424746
2025-12-04T08:57:44.2670730Z  * [new tag]                 viable/strict/1762446386    -> viable/strict/1762446386
2025-12-04T08:57:44.2671707Z  * [new tag]                 viable/strict/1762449912    -> viable/strict/1762449912
2025-12-04T08:57:44.2672698Z  * [new tag]                 viable/strict/1762457031    -> viable/strict/1762457031
2025-12-04T08:57:44.2673756Z  * [new tag]                 viable/strict/1762462441    -> viable/strict/1762462441
2025-12-04T08:57:44.2674735Z  * [new tag]                 viable/strict/1762467909    -> viable/strict/1762467909
2025-12-04T08:57:44.2675733Z  * [new tag]                 viable/strict/1762471493    -> viable/strict/1762471493
2025-12-04T08:57:44.2676783Z  * [new tag]                 viable/strict/1762475990    -> viable/strict/1762475990
2025-12-04T08:57:44.2677823Z  * [new tag]                 viable/strict/1762477933    -> viable/strict/1762477933
2025-12-04T08:57:44.2678788Z  * [new tag]                 viable/strict/1762491053    -> viable/strict/1762491053
2025-12-04T08:57:44.2679769Z  * [new tag]                 viable/strict/1762493118    -> viable/strict/1762493118
2025-12-04T08:57:44.2680690Z  * [new tag]                 viable/strict/1762498442    -> viable/strict/1762498442
2025-12-04T08:57:44.2681733Z  * [new tag]                 viable/strict/1762501778    -> viable/strict/1762501778
2025-12-04T08:57:44.2682687Z  * [new tag]                 viable/strict/1762504001    -> viable/strict/1762504001
2025-12-04T08:57:44.2683776Z  * [new tag]                 viable/strict/1762505583    -> viable/strict/1762505583
2025-12-04T08:57:44.2684823Z  * [new tag]                 viable/strict/1762507523    -> viable/strict/1762507523
2025-12-04T08:57:44.2685841Z  * [new tag]                 viable/strict/1762511140    -> viable/strict/1762511140
2025-12-04T08:57:44.2686961Z  * [new tag]                 viable/strict/1762512632    -> viable/strict/1762512632
2025-12-04T08:57:44.2687970Z  * [new tag]                 viable/strict/1762520467    -> viable/strict/1762520467
2025-12-04T08:57:44.2688935Z  * [new tag]                 viable/strict/1762522016    -> viable/strict/1762522016
2025-12-04T08:57:44.2689887Z  * [new tag]                 viable/strict/1762530591    -> viable/strict/1762530591
2025-12-04T08:57:44.2690843Z  * [new tag]                 viable/strict/1762543405    -> viable/strict/1762543405
2025-12-04T08:57:44.2691614Z  * [new tag]                 viable/strict/1762544998    -> viable/strict/1762544998
2025-12-04T08:57:44.2693272Z  * [new tag]                 viable/strict/1762552182    -> viable/strict/1762552182
2025-12-04T08:57:44.2693680Z  * [new tag]                 viable/strict/1762554297    -> viable/strict/1762554297
2025-12-04T08:57:44.2695255Z  * [new tag]                 viable/strict/1762559381    -> viable/strict/1762559381
2025-12-04T08:57:44.2695475Z  * [new tag]                 viable/strict/1762562222    -> viable/strict/1762562222
2025-12-04T08:57:44.2696494Z  * [new tag]                 viable/strict/1762564319    -> viable/strict/1762564319
2025-12-04T08:57:44.2697612Z  * [new tag]                 viable/strict/1762566904    -> viable/strict/1762566904
2025-12-04T08:57:44.2698604Z  * [new tag]                 viable/strict/1762569781    -> viable/strict/1762569781
2025-12-04T08:57:44.2699610Z  * [new tag]                 viable/strict/1762575940    -> viable/strict/1762575940
2025-12-04T08:57:44.2700591Z  * [new tag]                 viable/strict/1762580974    -> viable/strict/1762580974
2025-12-04T08:57:44.2701582Z  * [new tag]                 viable/strict/1762583185    -> viable/strict/1762583185
2025-12-04T08:57:44.2702587Z  * [new tag]                 viable/strict/1762586647    -> viable/strict/1762586647
2025-12-04T08:57:44.2703645Z  * [new tag]                 viable/strict/1762588183    -> viable/strict/1762588183
2025-12-04T08:57:44.2704653Z  * [new tag]                 viable/strict/1762593886    -> viable/strict/1762593886
2025-12-04T08:57:44.2705740Z  * [new tag]                 viable/strict/1762650743    -> viable/strict/1762650743
2025-12-04T08:57:44.2706824Z  * [new tag]                 viable/strict/1762653328    -> viable/strict/1762653328
2025-12-04T08:57:44.2707846Z  * [new tag]                 viable/strict/1762659342    -> viable/strict/1762659342
2025-12-04T08:57:44.2708935Z  * [new tag]                 viable/strict/1762662360    -> viable/strict/1762662360
2025-12-04T08:57:44.2709910Z  * [new tag]                 viable/strict/1762667377    -> viable/strict/1762667377
2025-12-04T08:57:44.2710868Z  * [new tag]                 viable/strict/1762671090    -> viable/strict/1762671090
2025-12-04T08:57:44.2711856Z  * [new tag]                 viable/strict/1762680284    -> viable/strict/1762680284
2025-12-04T08:57:44.2712829Z  * [new tag]                 viable/strict/1762683900    -> viable/strict/1762683900
2025-12-04T08:57:44.2713801Z  * [new tag]                 viable/strict/1762705541    -> viable/strict/1762705541
2025-12-04T08:57:44.2714760Z  * [new tag]                 viable/strict/1762709004    -> viable/strict/1762709004
2025-12-04T08:57:44.2715785Z  * [new tag]                 viable/strict/1762746004    -> viable/strict/1762746004
2025-12-04T08:57:44.2716848Z  * [new tag]                 viable/strict/1762748799    -> viable/strict/1762748799
2025-12-04T08:57:44.2717908Z  * [new tag]                 viable/strict/1762759504    -> viable/strict/1762759504
2025-12-04T08:57:44.2719354Z  * [new tag]                 viable/strict/1762760973    -> viable/strict/1762760973
2025-12-04T08:57:44.2720354Z  * [new tag]                 viable/strict/1762775374    -> viable/strict/1762775374
2025-12-04T08:57:44.2721756Z  * [new tag]                 viable/strict/1762777661    -> viable/strict/1762777661
2025-12-04T08:57:44.2722777Z  * [new tag]                 viable/strict/1762779774    -> viable/strict/1762779774
2025-12-04T08:57:44.2723973Z  * [new tag]                 viable/strict/1762781259    -> viable/strict/1762781259
2025-12-04T08:57:44.2725113Z  * [new tag]                 viable/strict/1762793628    -> viable/strict/1762793628
2025-12-04T08:57:44.2726201Z  * [new tag]                 viable/strict/1762800711    -> viable/strict/1762800711
2025-12-04T08:57:44.2727197Z  * [new tag]                 viable/strict/1762809894    -> viable/strict/1762809894
2025-12-04T08:57:44.2728180Z  * [new tag]                 viable/strict/1762811384    -> viable/strict/1762811384
2025-12-04T08:57:44.2729250Z  * [new tag]                 viable/strict/1762813841    -> viable/strict/1762813841
2025-12-04T08:57:44.2730271Z  * [new tag]                 viable/strict/1762815047    -> viable/strict/1762815047
2025-12-04T08:57:44.2731452Z  * [new tag]                 viable/strict/1762817094    -> viable/strict/1762817094
2025-12-04T08:57:44.2732656Z  * [new tag]                 viable/strict/1762818582    -> viable/strict/1762818582
2025-12-04T08:57:44.2733784Z  * [new tag]                 viable/strict/1762821623    -> viable/strict/1762821623
2025-12-04T08:57:44.2734550Z  * [new tag]                 viable/strict/1762823531    -> viable/strict/1762823531
2025-12-04T08:57:44.2735647Z  * [new tag]                 viable/strict/1762849583    -> viable/strict/1762849583
2025-12-04T08:57:44.2736661Z  * [new tag]                 viable/strict/1762851200    -> viable/strict/1762851200
2025-12-04T08:57:44.2737916Z  * [new tag]                 viable/strict/1762854603    -> viable/strict/1762854603
2025-12-04T08:57:44.2738975Z  * [new tag]                 viable/strict/1762858276    -> viable/strict/1762858276
2025-12-04T08:57:44.2740166Z  * [new tag]                 viable/strict/1762860891    -> viable/strict/1762860891
2025-12-04T08:57:44.2741769Z  * [new tag]                 viable/strict/1762866174    -> viable/strict/1762866174
2025-12-04T08:57:44.2742779Z  * [new tag]                 viable/strict/1762867653    -> viable/strict/1762867653
2025-12-04T08:57:44.2743788Z  * [new tag]                 viable/strict/1762872669    -> viable/strict/1762872669
2025-12-04T08:57:44.2744587Z  * [new tag]                 viable/strict/1762878380    -> viable/strict/1762878380
2025-12-04T08:57:44.2745712Z  * [new tag]                 viable/strict/1762889003    -> viable/strict/1762889003
2025-12-04T08:57:44.2746766Z  * [new tag]                 viable/strict/1762890589    -> viable/strict/1762890589
2025-12-04T08:57:44.2747779Z  * [new tag]                 viable/strict/1762892743    -> viable/strict/1762892743
2025-12-04T08:57:44.2748899Z  * [new tag]                 viable/strict/1762894271    -> viable/strict/1762894271
2025-12-04T08:57:44.2749674Z  * [new tag]                 viable/strict/1762896287    -> viable/strict/1762896287
2025-12-04T08:57:44.2750696Z  * [new tag]                 viable/strict/1762915871    -> viable/strict/1762915871
2025-12-04T08:57:44.2751760Z  * [new tag]                 viable/strict/1762918569    -> viable/strict/1762918569
2025-12-04T08:57:44.2752524Z  * [new tag]                 viable/strict/1762919776    -> viable/strict/1762919776
2025-12-04T08:57:44.2753574Z  * [new tag]                 viable/strict/1762923072    -> viable/strict/1762923072
2025-12-04T08:57:44.2754543Z  * [new tag]                 viable/strict/1762928826    -> viable/strict/1762928826
2025-12-04T08:57:44.2755642Z  * [new tag]                 viable/strict/1762930451    -> viable/strict/1762930451
2025-12-04T08:57:44.2756748Z  * [new tag]                 viable/strict/1762933780    -> viable/strict/1762933780
2025-12-04T08:57:44.2757588Z  * [new tag]                 viable/strict/1762937638    -> viable/strict/1762937638
2025-12-04T08:57:44.2758790Z  * [new tag]                 viable/strict/1762939545    -> viable/strict/1762939545
2025-12-04T08:57:44.2759796Z  * [new tag]                 viable/strict/1762962692    -> viable/strict/1762962692
2025-12-04T08:57:44.2760764Z  * [new tag]                 viable/strict/1762979143    -> viable/strict/1762979143
2025-12-04T08:57:44.2761754Z  * [new tag]                 viable/strict/1762984188    -> viable/strict/1762984188
2025-12-04T08:57:44.2762508Z  * [new tag]                 viable/strict/1762986306    -> viable/strict/1762986306
2025-12-04T08:57:44.2763575Z  * [new tag]                 viable/strict/1762989903    -> viable/strict/1762989903
2025-12-04T08:57:44.2764559Z  * [new tag]                 viable/strict/1762991377    -> viable/strict/1762991377
2025-12-04T08:57:44.2765608Z  * [new tag]                 viable/strict/1762998921    -> viable/strict/1762998921
2025-12-04T08:57:44.2766675Z  * [new tag]                 viable/strict/1763002287    -> viable/strict/1763002287
2025-12-04T08:57:44.2767680Z  * [new tag]                 viable/strict/1763016840    -> viable/strict/1763016840
2025-12-04T08:57:44.2768649Z  * [new tag]                 viable/strict/1763020180    -> viable/strict/1763020180
2025-12-04T08:57:44.2769781Z  * [new tag]                 viable/strict/1763027421    -> viable/strict/1763027421
2025-12-04T08:57:44.2770709Z  * [new tag]                 viable/strict/1763031120    -> viable/strict/1763031120
2025-12-04T08:57:44.2771710Z  * [new tag]                 viable/strict/1763036861    -> viable/strict/1763036861
2025-12-04T08:57:44.2772797Z  * [new tag]                 viable/strict/1763038993    -> viable/strict/1763038993
2025-12-04T08:57:44.2773905Z  * [new tag]                 viable/strict/1763054703    -> viable/strict/1763054703
2025-12-04T08:57:44.2774698Z  * [new tag]                 viable/strict/1763067061    -> viable/strict/1763067061
2025-12-04T08:57:44.2775707Z  * [new tag]                 viable/strict/1763070847    -> viable/strict/1763070847
2025-12-04T08:57:44.2776992Z  * [new tag]                 viable/strict/1763072706    -> viable/strict/1763072706
2025-12-04T08:57:44.2778155Z  * [new tag]                 viable/strict/1763076302    -> viable/strict/1763076302
2025-12-04T08:57:44.2779122Z  * [new tag]                 viable/strict/1763080816    -> viable/strict/1763080816
2025-12-04T08:57:44.2780138Z  * [new tag]                 viable/strict/1763082732    -> viable/strict/1763082732
2025-12-04T08:57:44.2781166Z  * [new tag]                 viable/strict/1763085329    -> viable/strict/1763085329
2025-12-04T08:57:44.2782196Z  * [new tag]                 viable/strict/1763088623    -> viable/strict/1763088623
2025-12-04T08:57:44.2783297Z  * [new tag]                 viable/strict/1763091402    -> viable/strict/1763091402
2025-12-04T08:57:44.2784318Z  * [new tag]                 viable/strict/1763092602    -> viable/strict/1763092602
2025-12-04T08:57:44.2785323Z  * [new tag]                 viable/strict/1763094355    -> viable/strict/1763094355
2025-12-04T08:57:44.2786825Z  * [new tag]                 viable/strict/1763099390    -> viable/strict/1763099390
2025-12-04T08:57:44.2787870Z  * [new tag]                 viable/strict/1763101608    -> viable/strict/1763101608
2025-12-04T08:57:44.2789043Z  * [new tag]                 viable/strict/1763105102    -> viable/strict/1763105102
2025-12-04T08:57:44.2790101Z  * [new tag]                 viable/strict/1763112347    -> viable/strict/1763112347
2025-12-04T08:57:44.2791106Z  * [new tag]                 viable/strict/1763119471    -> viable/strict/1763119471
2025-12-04T08:57:44.2791895Z  * [new tag]                 viable/strict/1763126835    -> viable/strict/1763126835
2025-12-04T08:57:44.2792809Z  * [new tag]                 viable/strict/1763149779    -> viable/strict/1763149779
2025-12-04T08:57:44.2793841Z  * [new tag]                 viable/strict/1763164178    -> viable/strict/1763164178
2025-12-04T08:57:44.2794692Z  * [new tag]                 viable/strict/1763167104    -> viable/strict/1763167104
2025-12-04T08:57:44.2795679Z  * [new tag]                 viable/strict/1763169132    -> viable/strict/1763169132
2025-12-04T08:57:44.2796659Z  * [new tag]                 viable/strict/1763171708    -> viable/strict/1763171708
2025-12-04T08:57:44.2797619Z  * [new tag]                 viable/strict/1763174759    -> viable/strict/1763174759
2025-12-04T08:57:44.2798696Z  * [new tag]                 viable/strict/1763180744    -> viable/strict/1763180744
2025-12-04T08:57:44.2799667Z  * [new tag]                 viable/strict/1763182227    -> viable/strict/1763182227
2025-12-04T08:57:44.2800624Z  * [new tag]                 viable/strict/1763184309    -> viable/strict/1763184309
2025-12-04T08:57:44.2802126Z  * [new tag]                 viable/strict/1763187991    -> viable/strict/1763187991
2025-12-04T08:57:44.2803097Z  * [new tag]                 viable/strict/1763191445    -> viable/strict/1763191445
2025-12-04T08:57:44.2804318Z  * [new tag]                 viable/strict/1763195152    -> viable/strict/1763195152
2025-12-04T08:57:44.2805079Z  * [new tag]                 viable/strict/1763205769    -> viable/strict/1763205769
2025-12-04T08:57:44.2806219Z  * [new tag]                 viable/strict/1763246990    -> viable/strict/1763246990
2025-12-04T08:57:44.2807317Z  * [new tag]                 viable/strict/1763261578    -> viable/strict/1763261578
2025-12-04T08:57:44.2808138Z  * [new tag]                 viable/strict/1763286573    -> viable/strict/1763286573
2025-12-04T08:57:44.2809036Z  * [new tag]                 viable/strict/1763292167    -> viable/strict/1763292167
2025-12-04T08:57:44.2810035Z  * [new tag]                 viable/strict/1763333386    -> viable/strict/1763333386
2025-12-04T08:57:44.2811018Z  * [new tag]                 viable/strict/1763340082    -> viable/strict/1763340082
2025-12-04T08:57:44.2812772Z  * [new tag]                 viable/strict/1763364324    -> viable/strict/1763364324
2025-12-04T08:57:44.2813805Z  * [new tag]                 viable/strict/1763371569    -> viable/strict/1763371569
2025-12-04T08:57:44.2814835Z  * [new tag]                 viable/strict/1763373067    -> viable/strict/1763373067
2025-12-04T08:57:44.2815764Z  * [new tag]                 viable/strict/1763375157    -> viable/strict/1763375157
2025-12-04T08:57:44.2817061Z  * [new tag]                 viable/strict/1763382462    -> viable/strict/1763382462
2025-12-04T08:57:44.2818117Z  * [new tag]                 viable/strict/1763394661    -> viable/strict/1763394661
2025-12-04T08:57:44.2819384Z  * [new tag]                 viable/strict/1763396797    -> viable/strict/1763396797
2025-12-04T08:57:44.2820438Z  * [new tag]                 viable/strict/1763398542    -> viable/strict/1763398542
2025-12-04T08:57:44.2821711Z  * [new tag]                 viable/strict/1763401807    -> viable/strict/1763401807
2025-12-04T08:57:44.2822571Z  * [new tag]                 viable/strict/1763414698    -> viable/strict/1763414698
2025-12-04T08:57:44.2823688Z  * [new tag]                 viable/strict/1763419807    -> viable/strict/1763419807
2025-12-04T08:57:44.2824723Z  * [new tag]                 viable/strict/1763426369    -> viable/strict/1763426369
2025-12-04T08:57:44.2825757Z  * [new tag]                 viable/strict/1763428331    -> viable/strict/1763428331
2025-12-04T08:57:44.2826826Z  * [new tag]                 viable/strict/1763430922    -> viable/strict/1763430922
2025-12-04T08:57:44.2827648Z  * [new tag]                 viable/strict/1763434184    -> viable/strict/1763434184
2025-12-04T08:57:44.2828692Z  * [new tag]                 viable/strict/1763439973    -> viable/strict/1763439973
2025-12-04T08:57:44.2829775Z  * [new tag]                 viable/strict/1763444995    -> viable/strict/1763444995
2025-12-04T08:57:44.2830876Z  * [new tag]                 viable/strict/1763447206    -> viable/strict/1763447206
2025-12-04T08:57:44.2831913Z  * [new tag]                 viable/strict/1763448826    -> viable/strict/1763448826
2025-12-04T08:57:44.2832944Z  * [new tag]                 viable/strict/1763450717    -> viable/strict/1763450717
2025-12-04T08:57:44.2834005Z  * [new tag]                 viable/strict/1763452183    -> viable/strict/1763452183
2025-12-04T08:57:44.2835116Z  * [new tag]                 viable/strict/1763457945    -> viable/strict/1763457945
2025-12-04T08:57:44.2836104Z  * [new tag]                 viable/strict/1763459439    -> viable/strict/1763459439
2025-12-04T08:57:44.2837129Z  * [new tag]                 viable/strict/1763461556    -> viable/strict/1763461556
2025-12-04T08:57:44.2838086Z  * [new tag]                 viable/strict/1763463103    -> viable/strict/1763463103
2025-12-04T08:57:44.2839109Z  * [new tag]                 viable/strict/1763465100    -> viable/strict/1763465100
2025-12-04T08:57:44.2840076Z  * [new tag]                 viable/strict/1763468866    -> viable/strict/1763468866
2025-12-04T08:57:44.2840822Z  * [new tag]                 viable/strict/1763493823    -> viable/strict/1763493823
2025-12-04T08:57:44.2841651Z  * [new tag]                 viable/strict/1763496249    -> viable/strict/1763496249
2025-12-04T08:57:44.2842690Z  * [new tag]                 viable/strict/1763502620    -> viable/strict/1763502620
2025-12-04T08:57:44.2843812Z  * [new tag]                 viable/strict/1763504715    -> viable/strict/1763504715
2025-12-04T08:57:44.2844774Z  * [new tag]                 viable/strict/1763506208    -> viable/strict/1763506208
2025-12-04T08:57:44.2845773Z  * [new tag]                 viable/strict/1763520590    -> viable/strict/1763520590
2025-12-04T08:57:44.2846778Z  * [new tag]                 viable/strict/1763523357    -> viable/strict/1763523357
2025-12-04T08:57:44.2847866Z  * [new tag]                 viable/strict/1763529922    -> viable/strict/1763529922
2025-12-04T08:57:44.2848906Z  * [new tag]                 viable/strict/1763531408    -> viable/strict/1763531408
2025-12-04T08:57:44.2849861Z  * [new tag]                 viable/strict/1763533622    -> viable/strict/1763533622
2025-12-04T08:57:44.2850844Z  * [new tag]                 viable/strict/1763538576    -> viable/strict/1763538576
2025-12-04T08:57:44.2851994Z  * [new tag]                 viable/strict/1763545823    -> viable/strict/1763545823
2025-12-04T08:57:44.2853148Z  * [new tag]                 viable/strict/1763547951    -> viable/strict/1763547951
2025-12-04T08:57:44.2854191Z  * [new tag]                 viable/strict/1763551477    -> viable/strict/1763551477
2025-12-04T08:57:44.2855176Z  * [new tag]                 viable/strict/1763552982    -> viable/strict/1763552982
2025-12-04T08:57:44.2856142Z  * [new tag]                 viable/strict/1763594698    -> viable/strict/1763594698
2025-12-04T08:57:44.2857526Z  * [new tag]                 viable/strict/1763596178    -> viable/strict/1763596178
2025-12-04T08:57:44.2858561Z  * [new tag]                 viable/strict/1763599155    -> viable/strict/1763599155
2025-12-04T08:57:44.2859565Z  * [new tag]                 viable/strict/1763603717    -> viable/strict/1763603717
2025-12-04T08:57:44.2860598Z  * [new tag]                 viable/strict/1763606923    -> viable/strict/1763606923
2025-12-04T08:57:44.2861643Z  * [new tag]                 viable/strict/1763609715    -> viable/strict/1763609715
2025-12-04T08:57:44.2862628Z  * [new tag]                 viable/strict/1763612757    -> viable/strict/1763612757
2025-12-04T08:57:44.2863622Z  * [new tag]                 viable/strict/1763616325    -> viable/strict/1763616325
2025-12-04T08:57:44.2864616Z  * [new tag]                 viable/strict/1763623509    -> viable/strict/1763623509
2025-12-04T08:57:44.2865866Z  * [new tag]                 viable/strict/1763624984    -> viable/strict/1763624984
2025-12-04T08:57:44.2866854Z  * [new tag]                 viable/strict/1763628796    -> viable/strict/1763628796
2025-12-04T08:57:44.2868050Z  * [new tag]                 viable/strict/1763634343    -> viable/strict/1763634343
2025-12-04T08:57:44.2868860Z  * [new tag]                 viable/strict/1763635867    -> viable/strict/1763635867
2025-12-04T08:57:44.2870056Z  * [new tag]                 viable/strict/1763639382    -> viable/strict/1763639382
2025-12-04T08:57:44.2871054Z  * [new tag]                 viable/strict/1763646626    -> viable/strict/1763646626
2025-12-04T08:57:44.2872162Z  * [new tag]                 viable/strict/1763655997    -> viable/strict/1763655997
2025-12-04T08:57:44.2873294Z  * [new tag]                 viable/strict/1763659444    -> viable/strict/1763659444
2025-12-04T08:57:44.2874247Z  * [new tag]                 viable/strict/1763660992    -> viable/strict/1763660992
2025-12-04T08:57:44.2875180Z  * [new tag]                 viable/strict/1763663201    -> viable/strict/1763663201
2025-12-04T08:57:44.2876203Z  * [new tag]                 viable/strict/1763670362    -> viable/strict/1763670362
2025-12-04T08:57:44.2876986Z  * [new tag]                 viable/strict/1763675378    -> viable/strict/1763675378
2025-12-04T08:57:44.2878012Z  * [new tag]                 viable/strict/1763693343    -> viable/strict/1763693343
2025-12-04T08:57:44.2878947Z  * [new tag]                 viable/strict/1763696088    -> viable/strict/1763696088
2025-12-04T08:57:44.2880099Z  * [new tag]                 viable/strict/1763697343    -> viable/strict/1763697343
2025-12-04T08:57:44.2881060Z  * [new tag]                 viable/strict/1763699165    -> viable/strict/1763699165
2025-12-04T08:57:44.2882019Z  * [new tag]                 viable/strict/1763700660    -> viable/strict/1763700660
2025-12-04T08:57:44.2882969Z  * [new tag]                 viable/strict/1763704209    -> viable/strict/1763704209
2025-12-04T08:57:44.2883985Z  * [new tag]                 viable/strict/1763706411    -> viable/strict/1763706411
2025-12-04T08:57:44.2884927Z  * [new tag]                 viable/strict/1763708082    -> viable/strict/1763708082
2025-12-04T08:57:44.2885738Z  * [new tag]                 viable/strict/1763711381    -> viable/strict/1763711381
2025-12-04T08:57:44.2886660Z  * [new tag]                 viable/strict/1763713593    -> viable/strict/1763713593
2025-12-04T08:57:44.2887690Z  * [new tag]                 viable/strict/1763715201    -> viable/strict/1763715201
2025-12-04T08:57:44.2888632Z  * [new tag]                 viable/strict/1763733017    -> viable/strict/1763733017
2025-12-04T08:57:44.2889646Z  * [new tag]                 viable/strict/1763735108    -> viable/strict/1763735108
2025-12-04T08:57:44.2890604Z  * [new tag]                 viable/strict/1763749579    -> viable/strict/1763749579
2025-12-04T08:57:44.2891577Z  * [new tag]                 viable/strict/1763751113    -> viable/strict/1763751113
2025-12-04T08:57:44.2892558Z  * [new tag]                 viable/strict/1763753035    -> viable/strict/1763753035
2025-12-04T08:57:44.2893549Z  * [new tag]                 viable/strict/1763754578    -> viable/strict/1763754578
2025-12-04T08:57:44.2894567Z  * [new tag]                 viable/strict/1763756748    -> viable/strict/1763756748
2025-12-04T08:57:44.2895525Z  * [new tag]                 viable/strict/1763758205    -> viable/strict/1763758205
2025-12-04T08:57:44.2896372Z  * [new tag]                 viable/strict/1763764050    -> viable/strict/1763764050
2025-12-04T08:57:44.2897644Z  * [new tag]                 viable/strict/1763771887    -> viable/strict/1763771887
2025-12-04T08:57:44.2898811Z  * [new tag]                 viable/strict/1763773920    -> viable/strict/1763773920
2025-12-04T08:57:44.2899809Z  * [new tag]                 viable/strict/1763776501    -> viable/strict/1763776501
2025-12-04T08:57:44.2900768Z  * [new tag]                 viable/strict/1763779437    -> viable/strict/1763779437
2025-12-04T08:57:44.2902008Z  * [new tag]                 viable/strict/1763781038    -> viable/strict/1763781038
2025-12-04T08:57:44.2902771Z  * [new tag]                 viable/strict/1763782245    -> viable/strict/1763782245
2025-12-04T08:57:44.2903968Z  * [new tag]                 viable/strict/1763785568    -> viable/strict/1763785568
2025-12-04T08:57:44.2905021Z  * [new tag]                 viable/strict/1763787006    -> viable/strict/1763787006
2025-12-04T08:57:44.2906110Z  * [new tag]                 viable/strict/1763789103    -> viable/strict/1763789103
2025-12-04T08:57:44.2907070Z  * [new tag]                 viable/strict/1763790578    -> viable/strict/1763790578
2025-12-04T08:57:44.2908075Z  * [new tag]                 viable/strict/1763796275    -> viable/strict/1763796275
2025-12-04T08:57:44.2909456Z  * [new tag]                 viable/strict/1763801465    -> viable/strict/1763801465
2025-12-04T08:57:44.2910405Z  * [new tag]                 viable/strict/1763803522    -> viable/strict/1763803522
2025-12-04T08:57:44.2911351Z  * [new tag]                 viable/strict/1763808581    -> viable/strict/1763808581
2025-12-04T08:57:44.2912341Z  * [new tag]                 viable/strict/1763840977    -> viable/strict/1763840977
2025-12-04T08:57:44.2913284Z  * [new tag]                 viable/strict/1763846659    -> viable/strict/1763846659
2025-12-04T08:57:44.2914238Z  * [new tag]                 viable/strict/1763872065    -> viable/strict/1763872065
2025-12-04T08:57:44.2915306Z  * [new tag]                 viable/strict/1763873648    -> viable/strict/1763873648
2025-12-04T08:57:44.2916307Z  * [new tag]                 viable/strict/1763875506    -> viable/strict/1763875506
2025-12-04T08:57:44.2917039Z  * [new tag]                 viable/strict/1763889904    -> viable/strict/1763889904
2025-12-04T08:57:44.2918549Z  * [new tag]                 viable/strict/1763930999    -> viable/strict/1763930999
2025-12-04T08:57:44.2919537Z  * [new tag]                 viable/strict/1763944964    -> viable/strict/1763944964
2025-12-04T08:57:44.2920317Z  * [new tag]                 viable/strict/1763958474    -> viable/strict/1763958474
2025-12-04T08:57:44.2921743Z  * [new tag]                 viable/strict/1763967263    -> viable/strict/1763967263
2025-12-04T08:57:44.2922780Z  * [new tag]                 viable/strict/1763972803    -> viable/strict/1763972803
2025-12-04T08:57:44.2923762Z  * [new tag]                 viable/strict/1763976376    -> viable/strict/1763976376
2025-12-04T08:57:44.2924787Z  * [new tag]                 viable/strict/1763989404    -> viable/strict/1763989404
2025-12-04T08:57:44.2925754Z  * [new tag]                 viable/strict/1763990887    -> viable/strict/1763990887
2025-12-04T08:57:44.2926742Z  * [new tag]                 viable/strict/1764019919    -> viable/strict/1764019919
2025-12-04T08:57:44.2927776Z  * [new tag]                 viable/strict/1764023134    -> viable/strict/1764023134
2025-12-04T08:57:44.2928579Z  * [new tag]                 viable/strict/1764024593    -> viable/strict/1764024593
2025-12-04T08:57:44.2929588Z  * [new tag]                 viable/strict/1764026706    -> viable/strict/1764026706
2025-12-04T08:57:44.2930893Z  * [new tag]                 viable/strict/1764031139    -> viable/strict/1764031139
2025-12-04T08:57:44.2931890Z  * [new tag]                 viable/strict/1764033131    -> viable/strict/1764033131
2025-12-04T08:57:44.2932738Z  * [new tag]                 viable/strict/1764035725    -> viable/strict/1764035725
2025-12-04T08:57:44.2933657Z  * [new tag]                 viable/strict/1764624265    -> viable/strict/1764624265
2025-12-04T08:57:44.2934475Z  * [new tag]                 viable/strict/1764631514    -> viable/strict/1764631514
2025-12-04T08:57:44.2935292Z  * [new tag]                 viable/strict/1764632987    -> viable/strict/1764632987
2025-12-04T08:57:44.2936077Z  * [new tag]                 viable/strict/1764636063    -> viable/strict/1764636063
2025-12-04T08:57:44.2937357Z  * [new tag]                 viable/strict/1764643975    -> viable/strict/1764643975
2025-12-04T08:57:44.2938355Z  * [new tag]                 viable/strict/1764646859    -> viable/strict/1764646859
2025-12-04T08:57:44.2939133Z  * [new tag]                 viable/strict/1764653120    -> viable/strict/1764653120
2025-12-04T08:57:44.2940060Z  * [new tag]                 viable/strict/1764654632    -> viable/strict/1764654632
2025-12-04T08:57:44.2940782Z  * [new tag]                 viable/strict/1764656821    -> viable/strict/1764656821
2025-12-04T08:57:44.2941618Z  * [new tag]                 viable/strict/1764658557    -> viable/strict/1764658557
2025-12-04T08:57:44.2942425Z  * [new tag]                 viable/strict/1764660333    -> viable/strict/1764660333
2025-12-04T08:57:44.2943246Z  * [new tag]                 viable/strict/1764661812    -> viable/strict/1764661812
2025-12-04T08:57:44.2944047Z  * [new tag]                 viable/strict/1764664023    -> viable/strict/1764664023
2025-12-04T08:57:44.2944876Z  * [new tag]                 viable/strict/1764669150    -> viable/strict/1764669150
2025-12-04T08:57:44.2945692Z  * [new tag]                 viable/strict/1764680709    -> viable/strict/1764680709
2025-12-04T08:57:44.2946499Z  * [new tag]                 viable/strict/1764687619    -> viable/strict/1764687619
2025-12-04T08:57:44.2947337Z  * [new tag]                 viable/strict/1764696355    -> viable/strict/1764696355
2025-12-04T08:57:44.2948156Z  * [new tag]                 viable/strict/1764701767    -> viable/strict/1764701767
2025-12-04T08:57:44.2949113Z  * [new tag]                 viable/strict/1764710768    -> viable/strict/1764710768
2025-12-04T08:57:44.2949905Z  * [new tag]                 viable/strict/1764716202    -> viable/strict/1764716202
2025-12-04T08:57:44.2950714Z  * [new tag]                 viable/strict/1764793566    -> viable/strict/1764793566
2025-12-04T08:57:44.2951515Z  * [new tag]                 viable/strict/1764797093    -> viable/strict/1764797093
2025-12-04T08:57:44.2952416Z  * [new tag]                 viable/strict/1764800729    -> viable/strict/1764800729
2025-12-04T08:57:44.2953336Z  * [new tag]                 whc_flight_1                -> whc_flight_1
2025-12-04T08:57:44.2954698Z  * [new tag]                 whc_flight_2                -> whc_flight_2
2025-12-04T08:57:44.2955855Z  * [new tag]                 whc_flight_4                -> whc_flight_4
2025-12-04T08:57:44.3634557Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T08:57:44.3658485Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:44.3661878Z ##[endgroup]
2025-12-04T08:57:44.3662228Z ##[group]Determining the checkout info
2025-12-04T08:57:44.3662955Z ##[endgroup]
2025-12-04T08:57:44.3666846Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T08:57:44.3697690Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T08:57:44.3723481Z ##[group]Checking out the ref
2025-12-04T08:57:44.3727002Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:45.4258713Z Updating files:  80% (16107/20121)
2025-12-04T08:57:45.4553593Z Updating files:  81% (16299/20121)
2025-12-04T08:57:45.4771401Z Updating files:  82% (16500/20121)
2025-12-04T08:57:45.4921920Z Updating files:  83% (16701/20121)
2025-12-04T08:57:45.5058242Z Updating files:  84% (16902/20121)
2025-12-04T08:57:45.5218494Z Updating files:  85% (17103/20121)
2025-12-04T08:57:45.5377368Z Updating files:  86% (17305/20121)
2025-12-04T08:57:45.5514750Z Updating files:  87% (17506/20121)
2025-12-04T08:57:45.5624777Z Updating files:  88% (17707/20121)
2025-12-04T08:57:45.5765262Z Updating files:  89% (17908/20121)
2025-12-04T08:57:45.5938677Z Updating files:  90% (18109/20121)
2025-12-04T08:57:45.6053441Z Updating files:  91% (18311/20121)
2025-12-04T08:57:45.6207781Z Updating files:  92% (18512/20121)
2025-12-04T08:57:45.6392744Z Updating files:  93% (18713/20121)
2025-12-04T08:57:45.6600713Z Updating files:  94% (18914/20121)
2025-12-04T08:57:45.6777519Z Updating files:  95% (19115/20121)
2025-12-04T08:57:45.6932061Z Updating files:  96% (19317/20121)
2025-12-04T08:57:45.7098211Z Updating files:  97% (19518/20121)
2025-12-04T08:57:45.7392404Z Updating files:  98% (19719/20121)
2025-12-04T08:57:45.7566870Z Updating files:  99% (19920/20121)
2025-12-04T08:57:45.7567390Z Updating files: 100% (20121/20121)
2025-12-04T08:57:45.7568044Z Updating files: 100% (20121/20121), done.
2025-12-04T08:57:45.7853750Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'.
2025-12-04T08:57:45.7854301Z 
2025-12-04T08:57:45.7854622Z You are in 'detached HEAD' state. You can look around, make experimental
2025-12-04T08:57:45.7855272Z changes and commit them, and you can discard any commits you make in this
2025-12-04T08:57:45.7855889Z state without impacting any branches by switching back to a branch.
2025-12-04T08:57:45.7856267Z 
2025-12-04T08:57:45.7856622Z If you want to create a new branch to retain commits you create, you may
2025-12-04T08:57:45.7857385Z do so (now or later) by using -c with the switch command. Example:
2025-12-04T08:57:45.7857724Z 
2025-12-04T08:57:45.7857867Z   git switch -c <new-branch-name>
2025-12-04T08:57:45.7858096Z 
2025-12-04T08:57:45.7858222Z Or undo this operation with:
2025-12-04T08:57:45.7858440Z 
2025-12-04T08:57:45.7858543Z   git switch -
2025-12-04T08:57:45.7858705Z 
2025-12-04T08:57:45.7858997Z Turn off this advice by setting config variable advice.detachedHead to false
2025-12-04T08:57:45.7859414Z 
2025-12-04T08:57:45.7859738Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T08:57:45.7939667Z ##[endgroup]
2025-12-04T08:57:45.7940193Z ##[group]Setting up auth for fetching submodules
2025-12-04T08:57:45.7946335Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T08:57:45.7998379Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T08:57:45.8032058Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T08:57:45.8058436Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T08:57:45.8083017Z ##[endgroup]
2025-12-04T08:57:45.8083498Z ##[group]Fetching submodules
2025-12-04T08:57:45.8086037Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T08:57:45.8412687Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T08:57:45.8732352Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2025-12-04T08:57:45.8734009Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2025-12-04T08:57:45.8736875Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2025-12-04T08:57:45.8739617Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2025-12-04T08:57:45.8742247Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX'
2025-12-04T08:57:45.8745567Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2025-12-04T08:57:45.8748172Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2025-12-04T08:57:45.8751322Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter'
2025-12-04T08:57:45.8754511Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2025-12-04T08:57:45.8757923Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel'
2025-12-04T08:57:45.8761140Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib'
2025-12-04T08:57:45.8764479Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2025-12-04T08:57:45.8768784Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2025-12-04T08:57:45.8772089Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2025-12-04T08:57:45.8775777Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2025-12-04T08:57:45.8780283Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention'
2025-12-04T08:57:45.8785933Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2025-12-04T08:57:45.8790159Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2025-12-04T08:57:45.8794457Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:57:45.8798522Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo'
2025-12-04T08:57:45.8802998Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2025-12-04T08:57:45.8807317Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2025-12-04T08:57:45.8811848Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2025-12-04T08:57:45.8816468Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2025-12-04T08:57:45.8822065Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai'
2025-12-04T08:57:45.8827090Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc'
2025-12-04T08:57:45.8832149Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2025-12-04T08:57:45.8837181Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2025-12-04T08:57:45.8842618Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp'
2025-12-04T08:57:45.8847590Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2025-12-04T08:57:45.8852958Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2025-12-04T08:57:45.8858771Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2025-12-04T08:57:45.8864682Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2025-12-04T08:57:45.8871913Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2025-12-04T08:57:45.8877790Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2025-12-04T08:57:45.8883489Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2025-12-04T08:57:45.8889498Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2025-12-04T08:57:45.8922727Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2025-12-04T08:57:46.1019890Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2025-12-04T08:57:46.1020943Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2025-12-04T08:57:46.1021983Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2025-12-04T08:57:46.1047931Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2025-12-04T08:57:46.1051699Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2025-12-04T08:57:46.1159583Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'...
2025-12-04T08:57:46.4637301Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2025-12-04T08:57:46.4639059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2025-12-04T08:57:46.4640774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2025-12-04T08:57:46.4642214Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2025-12-04T08:57:46.4643877Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2025-12-04T08:57:46.4645700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2025-12-04T08:57:46.4647423Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2025-12-04T08:57:46.4649078Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2025-12-04T08:57:46.4650713Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'...
2025-12-04T08:57:46.5578899Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2025-12-04T08:57:47.7412032Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'...
2025-12-04T08:57:47.7413089Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'...
2025-12-04T08:57:47.7414140Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2025-12-04T08:57:47.7415074Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2025-12-04T08:57:47.7416134Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2025-12-04T08:57:47.7417236Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'...
2025-12-04T08:57:47.7418141Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2025-12-04T08:57:47.7419026Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2025-12-04T08:57:47.7420091Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2025-12-04T08:57:47.7421392Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2025-12-04T08:57:47.7786745Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2025-12-04T08:57:59.4369956Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2025-12-04T08:57:59.4370947Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2025-12-04T08:57:59.4371765Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2025-12-04T08:57:59.4372548Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2025-12-04T08:57:59.4373442Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
2025-12-04T08:57:59.4374421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2025-12-04T08:57:59.4375271Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
2025-12-04T08:57:59.4376103Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2025-12-04T08:57:59.5372245Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'...
2025-12-04T08:58:02.1686492Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T08:58:02.1814277Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T08:58:02.1915408Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T08:58:02.2174470Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T08:58:02.3052915Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T08:58:02.3649451Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T08:58:03.1339229Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T08:58:03.3314113Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T08:58:03.3337121Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:03.3365066Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'...
2025-12-04T08:58:08.1229875Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T08:58:08.1479872Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T08:58:08.5192847Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T08:58:08.5731332Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T08:58:08.6754533Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T08:58:08.7259379Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T08:58:09.4137788Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T08:58:09.5760808Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T08:58:09.5783186Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:09.5785385Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:09.5787878Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:09.5790902Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:09.5793650Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:09.5796605Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:09.5799311Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json'
2025-12-04T08:58:09.5829125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'...
2025-12-04T08:58:10.7865441Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'...
2025-12-04T08:58:10.7866518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'...
2025-12-04T08:58:10.7867543Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'...
2025-12-04T08:58:10.8866285Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'...
2025-12-04T08:58:13.9714636Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'...
2025-12-04T08:58:14.0715858Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'...
2025-12-04T08:58:16.5223546Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T08:58:16.9074065Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T08:58:17.0181595Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T08:58:17.7150776Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T08:58:17.7659993Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:17.7786419Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T08:58:17.8885218Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T08:58:17.9625810Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T08:58:17.9645217Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:17.9646752Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:17.9675557Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'...
2025-12-04T08:58:22.3417969Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'...
2025-12-04T08:58:22.5932732Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T08:58:23.1875801Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T08:58:23.3319298Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T08:58:23.3638608Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T08:58:23.4065158Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T08:58:23.4323781Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T08:58:23.4793244Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:23.4932683Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T08:58:23.4950631Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:23.4978026Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2025-12-04T08:58:38.6240872Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T08:58:38.6452527Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T08:58:38.7473995Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T08:58:38.7493165Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:38.7494661Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:38.7497197Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:38.7526411Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'...
2025-12-04T08:58:39.4004128Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2025-12-04T08:58:39.8075421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2025-12-04T08:58:39.9028066Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T08:58:39.9047902Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:39.9049376Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:39.9050815Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:39.9053871Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:39.9057032Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:39.9060361Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:39.9063518Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:39.9066809Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:39.9070467Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:39.9100202Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'...
2025-12-04T08:58:42.1730684Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'...
2025-12-04T08:58:42.1732097Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'...
2025-12-04T08:58:42.1733617Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'...
2025-12-04T08:58:42.1734999Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'...
2025-12-04T08:58:42.1736366Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'...
2025-12-04T08:58:42.1737902Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'...
2025-12-04T08:58:42.1739275Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'...
2025-12-04T08:58:42.2731549Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'...
2025-12-04T08:58:46.9478287Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T08:58:46.9679569Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T08:58:47.0064340Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T08:58:47.0207985Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T08:58:47.0224803Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:47.0253901Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'...
2025-12-04T08:58:47.3116064Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T08:58:47.3310612Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T08:58:47.3778985Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:47.4850165Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T08:58:47.5026981Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T08:58:47.5207665Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T08:58:47.5224450Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:47.5227114Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:47.5256148Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T08:58:49.5386822Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T08:58:49.8052398Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T08:58:49.8544853Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T08:58:49.8876291Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T08:58:49.9346662Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:49.9893914Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T08:58:50.0295897Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T08:58:50.1340528Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T08:58:50.5523918Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T08:58:50.5564010Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:50.5592468Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2025-12-04T08:58:51.3267232Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T08:58:51.3983789Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T08:58:51.4002873Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:51.4005095Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:51.4007646Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:51.4010446Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:51.4013839Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:51.4016584Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:51.4020014Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:51.4023369Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:51.4050730Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'...
2025-12-04T08:58:51.7810787Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'...
2025-12-04T08:58:51.7812149Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'...
2025-12-04T08:58:51.7813413Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'...
2025-12-04T08:58:51.7814592Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'...
2025-12-04T08:58:51.8812424Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'...
2025-12-04T08:58:52.3620685Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'...
2025-12-04T08:58:58.5099627Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'...
2025-12-04T08:58:59.2463264Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T08:58:59.2891946Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T08:58:59.3067015Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T08:58:59.4139732Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T08:58:59.4285437Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T08:58:59.4437703Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T08:58:59.4617620Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T08:58:59.4636444Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:59.4638600Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:59.4664769Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T08:59:01.3029310Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T08:59:01.5668715Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T08:59:01.6166442Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T08:59:02.0979502Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T08:59:02.1102239Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T08:59:02.3932146Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T08:59:02.3957529Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:02.3959081Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:02.3987523Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2025-12-04T08:59:02.9039490Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2025-12-04T08:59:03.2354561Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T08:59:03.3104924Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T08:59:03.3203431Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T08:59:03.3326597Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T08:59:03.3767782Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T08:59:03.4073946Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T08:59:03.4518902Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T08:59:03.4796065Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T08:59:03.4814255Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:03.4815570Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:03.4818336Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:03.4821214Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:03.4848737Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2025-12-04T08:59:04.3534754Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2025-12-04T08:59:04.3536013Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2025-12-04T08:59:04.4296538Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2025-12-04T08:59:04.4898495Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T08:59:04.5060718Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T08:59:04.5836674Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T08:59:04.6138455Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T08:59:04.6156033Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:04.6190724Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2025-12-04T08:59:04.7913579Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T08:59:04.7951098Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T08:59:04.8276873Z Entering 'android/libs/fbjni'
2025-12-04T08:59:04.8320163Z Entering 'third_party/FP16'
2025-12-04T08:59:04.8364147Z Entering 'third_party/FXdiv'
2025-12-04T08:59:04.8408962Z Entering 'third_party/NNPACK'
2025-12-04T08:59:04.8453961Z Entering 'third_party/NVTX'
2025-12-04T08:59:04.8498687Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:04.8542053Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:04.8601505Z Entering 'third_party/aiter'
2025-12-04T08:59:04.8646534Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:04.8698851Z Entering 'third_party/benchmark'
2025-12-04T08:59:04.8741622Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:04.8795761Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:04.8840364Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:04.8885171Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:04.8929736Z Entering 'third_party/cutlass'
2025-12-04T08:59:04.8984051Z Entering 'third_party/fbgemm'
2025-12-04T08:59:04.9030998Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:04.9075547Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:04.9125397Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:04.9173972Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:04.9235762Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:04.9278045Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:04.9319692Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:04.9367465Z Entering 'third_party/flash-attention'
2025-12-04T08:59:04.9411410Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:04.9460253Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:04.9515336Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:04.9563219Z Entering 'third_party/fmt'
2025-12-04T08:59:04.9607967Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:04.9652547Z Entering 'third_party/gloo'
2025-12-04T08:59:04.9696526Z Entering 'third_party/googletest'
2025-12-04T08:59:04.9739982Z Entering 'third_party/ideep'
2025-12-04T08:59:04.9781299Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:04.9835382Z Entering 'third_party/ittapi'
2025-12-04T08:59:04.9877443Z Entering 'third_party/kineto'
2025-12-04T08:59:04.9924129Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:04.9975239Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:05.0018621Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:05.0059792Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:05.0103298Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:05.0145250Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:05.0194455Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:05.0236737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:05.0279390Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:05.0322598Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:05.0367184Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:05.0417044Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:05.0461205Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:05.0507120Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:05.0556088Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:05.0598601Z Entering 'third_party/kleidiai'
2025-12-04T08:59:05.0642589Z Entering 'third_party/mimalloc'
2025-12-04T08:59:05.0687464Z Entering 'third_party/nlohmann'
2025-12-04T08:59:05.0733307Z Entering 'third_party/onnx'
2025-12-04T08:59:05.0797821Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:05.0842816Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:05.0889812Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:05.0933478Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:05.0975556Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:05.1017383Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:05.1058713Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:05.1099188Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:05.1140528Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:05.1184589Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:05.1230968Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:05.1281687Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:05.1344526Z Entering 'third_party/pocketfft'
2025-12-04T08:59:05.1389126Z Entering 'third_party/protobuf'
2025-12-04T08:59:05.1436842Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:05.1477614Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:05.1522598Z Entering 'third_party/psimd'
2025-12-04T08:59:05.1571080Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:05.1615111Z Entering 'third_party/pybind11'
2025-12-04T08:59:05.1658733Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:05.1701072Z Entering 'third_party/sleef'
2025-12-04T08:59:05.1745054Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:05.1789332Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:05.1830584Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:05.1873863Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:05.1915019Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:05.1957078Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:05.2013384Z ##[endgroup]
2025-12-04T08:59:05.2015169Z ##[group]Persisting credentials for submodules
2025-12-04T08:59:05.2020569Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T08:59:05.2329800Z Entering 'android/libs/fbjni'
2025-12-04T08:59:05.2390471Z Entering 'third_party/FP16'
2025-12-04T08:59:05.2458203Z Entering 'third_party/FXdiv'
2025-12-04T08:59:05.2518901Z Entering 'third_party/NNPACK'
2025-12-04T08:59:05.2578080Z Entering 'third_party/NVTX'
2025-12-04T08:59:05.2637753Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:05.2696685Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:05.2772696Z Entering 'third_party/aiter'
2025-12-04T08:59:05.2830911Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:05.2898958Z Entering 'third_party/benchmark'
2025-12-04T08:59:05.2957874Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:05.3028667Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:05.3087018Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:05.3144921Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:05.3202684Z Entering 'third_party/cutlass'
2025-12-04T08:59:05.3268326Z Entering 'third_party/fbgemm'
2025-12-04T08:59:05.3329642Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:05.3395739Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:05.3464898Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:05.3523536Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:05.3592847Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:05.3649652Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:05.3705698Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:05.3768872Z Entering 'third_party/flash-attention'
2025-12-04T08:59:05.3830014Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:05.3895596Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:05.3970270Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:05.4030399Z Entering 'third_party/fmt'
2025-12-04T08:59:05.4088969Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:05.4146928Z Entering 'third_party/gloo'
2025-12-04T08:59:05.4204341Z Entering 'third_party/googletest'
2025-12-04T08:59:05.4264667Z Entering 'third_party/ideep'
2025-12-04T08:59:05.4322081Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:05.4392355Z Entering 'third_party/ittapi'
2025-12-04T08:59:05.4454184Z Entering 'third_party/kineto'
2025-12-04T08:59:05.4513079Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:05.4572054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:05.4628786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:05.4686420Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:05.4744076Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:05.4800540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:05.4859885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:05.4917999Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:05.4974921Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:05.5038386Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:05.5095679Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:05.5157786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:05.5218925Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:05.5281758Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:05.5338241Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:05.5397609Z Entering 'third_party/kleidiai'
2025-12-04T08:59:05.5458100Z Entering 'third_party/mimalloc'
2025-12-04T08:59:05.5515332Z Entering 'third_party/nlohmann'
2025-12-04T08:59:05.5575628Z Entering 'third_party/onnx'
2025-12-04T08:59:05.5654731Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:05.5717947Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:05.5778409Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:05.5835787Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:05.5897812Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:05.5955701Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:05.6017393Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:05.6074002Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:05.6138178Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:05.6194298Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:05.6252930Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:05.6314305Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:05.6398132Z Entering 'third_party/pocketfft'
2025-12-04T08:59:05.6455277Z Entering 'third_party/protobuf'
2025-12-04T08:59:05.6517094Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:05.6576280Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:05.6640844Z Entering 'third_party/psimd'
2025-12-04T08:59:05.6697633Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:05.6756190Z Entering 'third_party/pybind11'
2025-12-04T08:59:05.6817909Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:05.6877705Z Entering 'third_party/sleef'
2025-12-04T08:59:05.6937980Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:05.6994315Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:05.7055751Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:05.7112554Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:05.7178165Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:05.7232607Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:05.7315439Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T08:59:05.7630618Z Entering 'android/libs/fbjni'
2025-12-04T08:59:05.7684232Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T08:59:05.7700130Z Entering 'third_party/FP16'
2025-12-04T08:59:05.7755136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T08:59:05.7772825Z Entering 'third_party/FXdiv'
2025-12-04T08:59:05.7826693Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T08:59:05.7844966Z Entering 'third_party/NNPACK'
2025-12-04T08:59:05.7898449Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T08:59:05.7916551Z Entering 'third_party/NVTX'
2025-12-04T08:59:05.7970578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T08:59:05.7987169Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:05.8042659Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T08:59:05.8058641Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:05.8112765Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T08:59:05.8146655Z Entering 'third_party/aiter'
2025-12-04T08:59:05.8200257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T08:59:05.8217973Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:05.8268345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T08:59:05.8297322Z Entering 'third_party/benchmark'
2025-12-04T08:59:05.8353149Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:59:05.8371100Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:05.8422050Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T08:59:05.8449215Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:05.8500108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T08:59:05.8518101Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:05.8577661Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T08:59:05.8595793Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:05.8647934Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T08:59:05.8663914Z Entering 'third_party/cutlass'
2025-12-04T08:59:05.8717869Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T08:59:05.8745829Z Entering 'third_party/fbgemm'
2025-12-04T08:59:05.8799995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T08:59:05.8818405Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:05.8874128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T08:59:05.8891518Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:05.8944170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T08:59:05.8971679Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:05.9022703Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T08:59:05.9040950Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:05.9092982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T08:59:05.9121401Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:05.9176728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T08:59:05.9194453Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:05.9247038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T08:59:05.9261729Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:05.9315324Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T08:59:05.9336531Z Entering 'third_party/flash-attention'
2025-12-04T08:59:05.9388138Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T08:59:05.9407292Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:05.9458908Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T08:59:05.9482447Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:05.9535627Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T08:59:05.9565070Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:05.9615547Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T08:59:05.9638275Z Entering 'third_party/fmt'
2025-12-04T08:59:05.9690846Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T08:59:05.9706759Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:05.9762460Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T08:59:05.9778247Z Entering 'third_party/gloo'
2025-12-04T08:59:05.9830039Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T08:59:05.9848671Z Entering 'third_party/googletest'
2025-12-04T08:59:05.9899824Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:05.9918728Z Entering 'third_party/ideep'
2025-12-04T08:59:05.9971541Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T08:59:05.9985877Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:06.0039834Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T08:59:06.0063295Z Entering 'third_party/ittapi'
2025-12-04T08:59:06.0117099Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T08:59:06.0136473Z Entering 'third_party/kineto'
2025-12-04T08:59:06.0186634Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T08:59:06.0205114Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:06.0258571Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T08:59:06.0275945Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:06.0326738Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T08:59:06.0344359Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:06.0397905Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T08:59:06.0415577Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:06.0467625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T08:59:06.0486771Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:06.0541393Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T08:59:06.0558790Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:06.0611298Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T08:59:06.0629366Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:06.0683403Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T08:59:06.0698556Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:06.0754835Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:06.0772387Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:06.0823873Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T08:59:06.0844214Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:06.0897730Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T08:59:06.0915417Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:06.0973881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T08:59:06.0991917Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:06.1045601Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T08:59:06.1063331Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:06.1117755Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T08:59:06.1139120Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:06.1193256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T08:59:06.1216697Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:06.1267299Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:06.1287471Z Entering 'third_party/kleidiai'
2025-12-04T08:59:06.1339542Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T08:59:06.1358809Z Entering 'third_party/mimalloc'
2025-12-04T08:59:06.1413175Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T08:59:06.1429333Z Entering 'third_party/nlohmann'
2025-12-04T08:59:06.1483760Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T08:59:06.1500983Z Entering 'third_party/onnx'
2025-12-04T08:59:06.1554674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T08:59:06.1589512Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:06.1641615Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:59:06.1660260Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:06.1715122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T08:59:06.1732481Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:06.1786720Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:59:06.1804110Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:06.1858719Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:06.1875999Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:06.1926578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T08:59:06.1942206Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:06.1995083Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T08:59:06.2014018Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:06.2064687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T08:59:06.2082514Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:06.2133851Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T08:59:06.2153850Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:06.2205872Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T08:59:06.2220412Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:06.2274088Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T08:59:06.2294052Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:06.2346945Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T08:59:06.2367327Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:06.2418641Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T08:59:06.2457237Z Entering 'third_party/pocketfft'
2025-12-04T08:59:06.2506832Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T08:59:06.2523628Z Entering 'third_party/protobuf'
2025-12-04T08:59:06.2577837Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T08:59:06.2597796Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:06.2648477Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:59:06.2664063Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:06.2718037Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:06.2739399Z Entering 'third_party/psimd'
2025-12-04T08:59:06.2795790Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T08:59:06.2813048Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:06.2866638Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T08:59:06.2884872Z Entering 'third_party/pybind11'
2025-12-04T08:59:06.2938450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:59:06.2956406Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:06.3009064Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T08:59:06.3024891Z Entering 'third_party/sleef'
2025-12-04T08:59:06.3078194Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T08:59:06.3095940Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:06.3148821Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T08:59:06.3168666Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:06.3219946Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:59:06.3238952Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:06.3290286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T08:59:06.3305219Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:06.3360904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T08:59:06.3378193Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:06.3430641Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:59:06.3447875Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:06.3498872Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T08:59:06.4200435Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T08:59:06.4518171Z Entering 'android/libs/fbjni'
2025-12-04T08:59:06.4562267Z Entering 'third_party/FP16'
2025-12-04T08:59:06.4608011Z Entering 'third_party/FXdiv'
2025-12-04T08:59:06.4653712Z Entering 'third_party/NNPACK'
2025-12-04T08:59:06.4697791Z Entering 'third_party/NVTX'
2025-12-04T08:59:06.4742046Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:06.4785770Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:06.4848063Z Entering 'third_party/aiter'
2025-12-04T08:59:06.4893305Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:06.4946889Z Entering 'third_party/benchmark'
2025-12-04T08:59:06.4993281Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:06.5045913Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:06.5089657Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:06.5136961Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:06.5179101Z Entering 'third_party/cutlass'
2025-12-04T08:59:06.5235516Z Entering 'third_party/fbgemm'
2025-12-04T08:59:06.5280814Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:06.5322922Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:06.5377526Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:06.5419370Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:06.5468541Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:06.5514475Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:06.5558615Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:06.5605667Z Entering 'third_party/flash-attention'
2025-12-04T08:59:06.5650822Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:06.5702424Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:06.5758389Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:06.5807060Z Entering 'third_party/fmt'
2025-12-04T08:59:06.5852866Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:06.5897693Z Entering 'third_party/gloo'
2025-12-04T08:59:06.5941114Z Entering 'third_party/googletest'
2025-12-04T08:59:06.5984519Z Entering 'third_party/ideep'
2025-12-04T08:59:06.6027135Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:06.6079293Z Entering 'third_party/ittapi'
2025-12-04T08:59:06.6122583Z Entering 'third_party/kineto'
2025-12-04T08:59:06.6167536Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:06.6210860Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:06.6263259Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:06.6306389Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:06.6351478Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:06.6394689Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:06.6439911Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:06.6483274Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:06.6527625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:06.6573761Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:06.6617187Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:06.6658357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:06.6702599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:06.6750696Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:06.6792906Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:06.6837102Z Entering 'third_party/kleidiai'
2025-12-04T08:59:06.6882205Z Entering 'third_party/mimalloc'
2025-12-04T08:59:06.6945665Z Entering 'third_party/nlohmann'
2025-12-04T08:59:06.6991878Z Entering 'third_party/onnx'
2025-12-04T08:59:06.7053291Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:06.7097955Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:06.7142099Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:06.7183387Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:06.7224923Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:06.7267045Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:06.7315208Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:06.7358436Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:06.7401708Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:06.7443114Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:06.7492001Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:06.7538915Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:06.7603623Z Entering 'third_party/pocketfft'
2025-12-04T08:59:06.7647826Z Entering 'third_party/protobuf'
2025-12-04T08:59:06.7694328Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:06.7737343Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:06.7779248Z Entering 'third_party/psimd'
2025-12-04T08:59:06.7823047Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:06.7867734Z Entering 'third_party/pybind11'
2025-12-04T08:59:06.7913435Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:06.7958528Z Entering 'third_party/sleef'
2025-12-04T08:59:06.8002211Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:06.8046011Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:06.8096881Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:06.8138005Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:06.8178783Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:06.8219591Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:06.8281637Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T08:59:06.8602912Z Entering 'android/libs/fbjni'
2025-12-04T08:59:06.8645673Z Entering 'third_party/FP16'
2025-12-04T08:59:06.8689708Z Entering 'third_party/FXdiv'
2025-12-04T08:59:06.8737995Z Entering 'third_party/NNPACK'
2025-12-04T08:59:06.8780198Z Entering 'third_party/NVTX'
2025-12-04T08:59:06.8827439Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:06.8873038Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:06.8933649Z Entering 'third_party/aiter'
2025-12-04T08:59:06.8977171Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:06.9028896Z Entering 'third_party/benchmark'
2025-12-04T08:59:06.9074920Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:06.9125782Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:06.9172052Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:06.9217960Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:06.9261730Z Entering 'third_party/cutlass'
2025-12-04T08:59:06.9317108Z Entering 'third_party/fbgemm'
2025-12-04T08:59:06.9364991Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:06.9408554Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:06.9462125Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:06.9504270Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:06.9557055Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:06.9598769Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:06.9641014Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:06.9687254Z Entering 'third_party/flash-attention'
2025-12-04T08:59:06.9733647Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:06.9780701Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:06.9839721Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:06.9887026Z Entering 'third_party/fmt'
2025-12-04T08:59:06.9935529Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:06.9978044Z Entering 'third_party/gloo'
2025-12-04T08:59:07.0019827Z Entering 'third_party/googletest'
2025-12-04T08:59:07.0063439Z Entering 'third_party/ideep'
2025-12-04T08:59:07.0106739Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:07.0159887Z Entering 'third_party/ittapi'
2025-12-04T08:59:07.0203647Z Entering 'third_party/kineto'
2025-12-04T08:59:07.0248264Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:07.0291303Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:07.0338782Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:07.0380812Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:07.0424005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:07.0467496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:07.0517129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:07.0560835Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:07.0603474Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:07.0648584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:07.0695967Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:07.0738576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:07.0791836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:07.0846921Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:07.0895623Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:07.0938896Z Entering 'third_party/kleidiai'
2025-12-04T08:59:07.0983254Z Entering 'third_party/mimalloc'
2025-12-04T08:59:07.1026215Z Entering 'third_party/nlohmann'
2025-12-04T08:59:07.1073704Z Entering 'third_party/onnx'
2025-12-04T08:59:07.1137217Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:07.1190169Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:07.1236654Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:07.1277391Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:07.1318863Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:07.1369359Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:07.1418609Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:07.1461379Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:07.1506097Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:07.1553629Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:07.1596720Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:07.1660615Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:07.1702470Z Entering 'third_party/pocketfft'
2025-12-04T08:59:07.1747388Z Entering 'third_party/protobuf'
2025-12-04T08:59:07.1795294Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:07.1837717Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:07.1882008Z Entering 'third_party/psimd'
2025-12-04T08:59:07.1925285Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:07.1971012Z Entering 'third_party/pybind11'
2025-12-04T08:59:07.2016620Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:07.2058982Z Entering 'third_party/sleef'
2025-12-04T08:59:07.2103133Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:07.2146111Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:07.2196515Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:07.2238633Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:07.2281431Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:07.2322760Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:07.2379497Z ##[endgroup]
2025-12-04T08:59:07.2414847Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T08:59:07.2440358Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:59:07.2548076Z ##[group]Run cd "${GITHUB_WORKSPACE}"
2025-12-04T08:59:07.2548487Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T08:59:07.2548854Z [36;1m# Clean stale submodule dirs[0m
2025-12-04T08:59:07.2549311Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T08:59:07.2549747Z [36;1m  sudo git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T08:59:07.2550173Z [36;1melse[0m
2025-12-04T08:59:07.2550509Z [36;1m  git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T08:59:07.2550901Z [36;1mfi[0m
2025-12-04T08:59:07.2558768Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:07.2559168Z env:
2025-12-04T08:59:07.2559519Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:07.2559801Z   NO_SUDO: true
2025-12-04T08:59:07.2560134Z ##[endgroup]
2025-12-04T08:59:07.2909011Z Entering 'android/libs/fbjni'
2025-12-04T08:59:07.2942907Z Entering 'third_party/FP16'
2025-12-04T08:59:07.2979500Z Entering 'third_party/FXdiv'
2025-12-04T08:59:07.3011643Z Entering 'third_party/NNPACK'
2025-12-04T08:59:07.3047627Z Entering 'third_party/NVTX'
2025-12-04T08:59:07.3087887Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:59:07.3121958Z Entering 'third_party/XNNPACK'
2025-12-04T08:59:07.3246709Z Entering 'third_party/aiter'
2025-12-04T08:59:07.3291676Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:59:07.3399132Z Entering 'third_party/benchmark'
2025-12-04T08:59:07.3435274Z Entering 'third_party/composable_kernel'
2025-12-04T08:59:07.3552188Z Entering 'third_party/cpp-httplib'
2025-12-04T08:59:07.3588548Z Entering 'third_party/cpuinfo'
2025-12-04T08:59:07.3624751Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:59:07.3660203Z Entering 'third_party/cutlass'
2025-12-04T08:59:07.3764273Z Entering 'third_party/fbgemm'
2025-12-04T08:59:07.3826025Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:59:07.3859113Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:59:07.3977403Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:59:07.4017862Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:59:07.4115229Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:59:07.4153909Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:59:07.4182694Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:59:07.4228253Z Entering 'third_party/flash-attention'
2025-12-04T08:59:07.4268405Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:59:07.4365027Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:59:07.4452886Z Entering 'third_party/flatbuffers'
2025-12-04T08:59:07.4521362Z Entering 'third_party/fmt'
2025-12-04T08:59:07.4555541Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:59:07.4590961Z Entering 'third_party/gloo'
2025-12-04T08:59:07.4626820Z Entering 'third_party/googletest'
2025-12-04T08:59:07.4662085Z Entering 'third_party/ideep'
2025-12-04T08:59:07.4697575Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:59:07.4781351Z Entering 'third_party/ittapi'
2025-12-04T08:59:07.4822698Z Entering 'third_party/kineto'
2025-12-04T08:59:07.4858722Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:59:07.4904416Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:59:07.4953795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:59:07.4986499Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:59:07.5019590Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:59:07.5054436Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:59:07.5087540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:59:07.5119180Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:59:07.5158161Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:59:07.5199305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:59:07.5236306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:59:07.5271880Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:07.5318707Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:07.5359447Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:59:07.5392741Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:59:07.5432455Z Entering 'third_party/kleidiai'
2025-12-04T08:59:07.5474138Z Entering 'third_party/mimalloc'
2025-12-04T08:59:07.5514504Z Entering 'third_party/nlohmann'
2025-12-04T08:59:07.5563082Z Entering 'third_party/onnx'
2025-12-04T08:59:07.5860347Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:59:07.5897774Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:59:07.5954488Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:59:07.5986094Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:59:07.6019543Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:59:07.6055590Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:59:07.6100552Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:59:07.6132373Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:59:07.6165230Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:59:07.6196526Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:59:07.6244700Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:59:07.6280559Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:59:07.6518441Z Entering 'third_party/pocketfft'
2025-12-04T08:59:07.6551366Z Entering 'third_party/protobuf'
2025-12-04T08:59:07.6626822Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:59:07.6665713Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:59:07.6701328Z Entering 'third_party/psimd'
2025-12-04T08:59:07.6734922Z Entering 'third_party/pthreadpool'
2025-12-04T08:59:07.6767409Z Entering 'third_party/pybind11'
2025-12-04T08:59:07.6802520Z Entering 'third_party/python-peachpy'
2025-12-04T08:59:07.6836119Z Entering 'third_party/sleef'
2025-12-04T08:59:07.6872833Z Entering 'third_party/tensorpipe'
2025-12-04T08:59:07.6907577Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:59:07.6940617Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:59:07.6973774Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:59:07.7011085Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:59:07.7042402Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:59:07.7211282Z Prepare all required actions
2025-12-04T08:59:07.7211820Z Getting action download info
2025-12-04T08:59:07.8709847Z ##[group]Run ./.github/actions/setup-linux
2025-12-04T08:59:07.8710167Z env:
2025-12-04T08:59:07.8710398Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:07.8710671Z ##[endgroup]
2025-12-04T08:59:07.8750698Z ##[group]Run set -euo pipefail
2025-12-04T08:59:07.8751053Z [36;1mset -euo pipefail[0m
2025-12-04T08:59:07.8751366Z [36;1mfunction get_ec2_metadata() {[0m
2025-12-04T08:59:07.8751763Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2025-12-04T08:59:07.8752416Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2025-12-04T08:59:07.8753026Z [36;1m  category=$1[0m
2025-12-04T08:59:07.8753405Z [36;1m  # If it is GCP runner (runner name contains gcp), do not run this[0m
2025-12-04T08:59:07.8753851Z [36;1m  runner_name_str=i-035b9d8fd6b020edf[0m
2025-12-04T08:59:07.8754250Z [36;1m  if [[ -f /.inarc ]]; then[0m
2025-12-04T08:59:07.8754602Z [36;1m    echo "ARC Runner, no info on ec2 metadata"[0m
2025-12-04T08:59:07.8755015Z [36;1m  elif [[ $runner_name_str == *"gcp"* ]]; then[0m
2025-12-04T08:59:07.8755509Z [36;1m    echo "Runner is from Google Cloud Platform, No info on ec2 metadata"[0m
2025-12-04T08:59:07.8755964Z [36;1m  else[0m
2025-12-04T08:59:07.8756858Z [36;1m    curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2025-12-04T08:59:07.8757822Z [36;1m  fi[0m
2025-12-04T08:59:07.8758045Z [36;1m}[0m
2025-12-04T08:59:07.8758470Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2025-12-04T08:59:07.8758904Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2025-12-04T08:59:07.8759403Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2025-12-04T08:59:07.8759835Z [36;1mecho "system info $(uname -a)"[0m
2025-12-04T08:59:07.8765775Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:07.8766172Z env:
2025-12-04T08:59:07.8766399Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:07.8766659Z ##[endgroup]
2025-12-04T08:59:07.8917615Z ami-id: ami-08982f1c5bf93d976
2025-12-04T08:59:07.9028578Z instance-id: i-035b9d8fd6b020edf
2025-12-04T08:59:07.9139445Z instance-type: g4dn.12xlarge
2025-12-04T08:59:07.9150360Z system info Linux ip-10-1-59-14.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep  9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2025-12-04T08:59:07.9170496Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
2025-12-04T08:59:07.9171014Z [36;1mif [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi[0m
2025-12-04T08:59:07.9177380Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:07.9177825Z env:
2025-12-04T08:59:07.9178081Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:07.9178376Z ##[endgroup]
2025-12-04T08:59:09.9893605Z Thu Dec  4 08:59:09 2025       
2025-12-04T08:59:09.9894613Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:59:09.9895367Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T08:59:09.9895983Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:59:09.9896718Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T08:59:09.9897595Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T08:59:09.9898143Z |                                         |                        |               MIG M. |
2025-12-04T08:59:09.9898589Z |=========================================+========================+======================|
2025-12-04T08:59:10.0276819Z |   0  Tesla T4                       Off |   00000000:00:1B.0 Off |                    0 |
2025-12-04T08:59:10.0278325Z | N/A   36C    P0             25W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T08:59:10.0278856Z |                                         |                        |                  N/A |
2025-12-04T08:59:10.0279353Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:59:10.0279895Z |   1  Tesla T4                       Off |   00000000:00:1C.0 Off |                    0 |
2025-12-04T08:59:10.0280413Z | N/A   35C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
2025-12-04T08:59:10.0280862Z |                                         |                        |                  N/A |
2025-12-04T08:59:10.0281355Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:59:10.0281895Z |   2  Tesla T4                       Off |   00000000:00:1D.0 Off |                    0 |
2025-12-04T08:59:10.0282404Z | N/A   34C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
2025-12-04T08:59:10.0282868Z |                                         |                        |                  N/A |
2025-12-04T08:59:10.0283354Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:59:10.0283888Z |   3  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T08:59:10.0284390Z | N/A   35C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
2025-12-04T08:59:10.0284857Z |                                         |                        |                  N/A |
2025-12-04T08:59:10.0285340Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:59:10.0285833Z 
2025-12-04T08:59:10.0286051Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:59:10.0286568Z | Processes:                                                                              |
2025-12-04T08:59:10.0287112Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T08:59:10.0287616Z |        ID   ID                                                               Usage      |
2025-12-04T08:59:10.0288035Z |=========================================================================================|
2025-12-04T08:59:10.0300443Z |  No running processes found                                                             |
2025-12-04T08:59:10.0301061Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:59:11.6945016Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T08:59:11.6946148Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T08:59:11.6953098Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:11.6953519Z env:
2025-12-04T08:59:11.6953746Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:11.6954034Z ##[endgroup]
2025-12-04T08:59:11.7012550Z ##[group]Run if systemctl is-active --quiet docker; then
2025-12-04T08:59:11.7013061Z [36;1mif systemctl is-active --quiet docker; then[0m
2025-12-04T08:59:11.7013507Z [36;1m    echo "Docker daemon is running...";[0m
2025-12-04T08:59:11.7013867Z [36;1melse[0m
2025-12-04T08:59:11.7014273Z [36;1m    echo "Starting docker daemon..." && sudo systemctl start docker;[0m
2025-12-04T08:59:11.7014760Z [36;1mfi[0m
2025-12-04T08:59:11.7021174Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:11.7021640Z env:
2025-12-04T08:59:11.7021897Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:11.7022191Z ##[endgroup]
2025-12-04T08:59:11.7105684Z Docker daemon is running...
2025-12-04T08:59:11.7148768Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T08:59:11.7149208Z with:
2025-12-04T08:59:11.7149416Z   shell: bash
2025-12-04T08:59:11.7149788Z   timeout_minutes: 5
2025-12-04T08:59:11.7150047Z   max_attempts: 3
2025-12-04T08:59:11.7150277Z   retry_wait_seconds: 30
2025-12-04T08:59:11.7152683Z   command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

# For LF Runners we need to make sure we also login to Meta's ECR docker registry too.
META_AWS_ACCOUNT_ID=308535385114
if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then
    aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
        --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
fi

2025-12-04T08:59:11.7155124Z   polling_interval_seconds: 1
2025-12-04T08:59:11.7155421Z   warning_on_retry: true
2025-12-04T08:59:11.7155693Z   continue_on_error: false
2025-12-04T08:59:11.7155942Z env:
2025-12-04T08:59:11.7156168Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:11.7156444Z   AWS_RETRY_MODE: standard
2025-12-04T08:59:11.7156700Z   AWS_MAX_ATTEMPTS: 5
2025-12-04T08:59:11.7156964Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T08:59:11.7157250Z ##[endgroup]
2025-12-04T08:59:12.8654343Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:59:12.8655075Z Configure a credential helper to remove this warning. See
2025-12-04T08:59:12.8655739Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:59:12.8656184Z 
2025-12-04T08:59:12.8656427Z Login Succeeded
2025-12-04T08:59:13.3948871Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:59:13.3950077Z Configure a credential helper to remove this warning. See
2025-12-04T08:59:13.3950739Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:59:13.3951183Z 
2025-12-04T08:59:13.3951285Z Login Succeeded
2025-12-04T08:59:13.8082220Z Command completed after 1 attempt(s).
2025-12-04T08:59:13.8135533Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T08:59:13.8136121Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T08:59:13.8136899Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T08:59:13.8145179Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:13.8145581Z env:
2025-12-04T08:59:13.8145801Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:13.8146079Z ##[endgroup]
2025-12-04T08:59:13.8228714Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T08:59:13.8229377Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T08:59:13.8229881Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T08:59:13.8230275Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T08:59:13.8230682Z [36;1m# Prune all of the docker images[0m
2025-12-04T08:59:13.8231064Z [36;1mdocker system prune -af[0m
2025-12-04T08:59:13.8236994Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:13.8237388Z env:
2025-12-04T08:59:13.8237603Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:13.8237876Z ##[endgroup]
2025-12-04T08:59:13.8477610Z "docker stop" requires at least 1 argument.
2025-12-04T08:59:13.8478405Z See 'docker stop --help'.
2025-12-04T08:59:13.8478668Z 
2025-12-04T08:59:13.8478893Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T08:59:13.8479226Z 
2025-12-04T08:59:13.8479356Z Stop one or more running containers
2025-12-04T08:59:13.8629871Z Total reclaimed space: 0B
2025-12-04T08:59:13.8830026Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T08:59:13.8830587Z with:
2025-12-04T08:59:13.8831518Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8832673Z   use-custom-docker-registry: true
2025-12-04T08:59:13.8833025Z   docker-build-dir: .ci/docker
2025-12-04T08:59:13.8833355Z   docker-build-script: ./build.sh
2025-12-04T08:59:13.8833691Z   working-directory: .
2025-12-04T08:59:13.8834086Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:13.8834539Z   force-push: false
2025-12-04T08:59:13.8834785Z env:
2025-12-04T08:59:13.8835025Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:13.8835315Z ##[endgroup]
2025-12-04T08:59:13.8855214Z ##[group]Run set -ex
2025-12-04T08:59:13.8855532Z [36;1mset -ex[0m
2025-12-04T08:59:13.8855781Z [36;1m[0m
2025-12-04T08:59:13.8856260Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T08:59:13.8857339Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T08:59:13.8858014Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T08:59:13.8858832Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T08:59:13.8859581Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8859967Z [36;1melse[0m
2025-12-04T08:59:13.8860269Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8860790Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8861252Z [36;1m[0m
2025-12-04T08:59:13.8861903Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T08:59:13.8862665Z [36;1m  exit 0[0m
2025-12-04T08:59:13.8863062Z [36;1mfi[0m
2025-12-04T08:59:13.8863305Z [36;1m[0m
2025-12-04T08:59:13.8863700Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T08:59:13.8864410Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T08:59:13.8865026Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T08:59:13.8865588Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T08:59:13.8866182Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8866740Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8867215Z [36;1melse[0m
2025-12-04T08:59:13.8867516Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T08:59:13.8867960Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T08:59:13.8868409Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T08:59:13.8868904Z [36;1m  fi[0m
2025-12-04T08:59:13.8869516Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T08:59:13.8870181Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8871067Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8871865Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8872347Z [36;1mfi[0m
2025-12-04T08:59:13.8879042Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:13.8879473Z env:
2025-12-04T08:59:13.8879721Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:13.8880009Z   REPO_NAME: pytorch
2025-12-04T08:59:13.8881174Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8882164Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T08:59:13.8882590Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T08:59:13.8882972Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:13.8883391Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T08:59:13.8883698Z   CUSTOM_TAG_PREFIX: 
2025-12-04T08:59:13.8883936Z ##[endgroup]
2025-12-04T08:59:13.8907979Z + [[ -d .ci/docker ]]
2025-12-04T08:59:13.8908296Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T08:59:13.8908647Z + [[ true == \t\r\u\e ]]
2025-12-04T08:59:13.8909056Z + echo skip=false
2025-12-04T08:59:13.8910278Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T08:59:13.8916530Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8917484Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T08:59:13.8940003Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8941055Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8942592Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.8965450Z ##[group]Run set +e
2025-12-04T08:59:13.8965759Z [36;1mset +e[0m
2025-12-04T08:59:13.8965983Z [36;1mset -x[0m
2025-12-04T08:59:13.8966214Z [36;1m[0m
2025-12-04T08:59:13.8966434Z [36;1mlogin() {[0m
2025-12-04T08:59:13.8966916Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T08:59:13.8967462Z [36;1m}[0m
2025-12-04T08:59:13.8967688Z [36;1m[0m
2025-12-04T08:59:13.8968007Z [36;1mretry () {[0m
2025-12-04T08:59:13.8968289Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T08:59:13.8968618Z [36;1m}[0m
2025-12-04T08:59:13.8968816Z [36;1m[0m
2025-12-04T08:59:13.8969056Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T08:59:13.8969374Z [36;1m[0m
2025-12-04T08:59:13.8969603Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T08:59:13.8969908Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T08:59:13.8970289Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T08:59:13.8970794Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T08:59:13.8971312Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T08:59:13.8971691Z [36;1m    exit 0[0m
2025-12-04T08:59:13.8971934Z [36;1m  fi[0m
2025-12-04T08:59:13.8972144Z [36;1m[0m
2025-12-04T08:59:13.8972549Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T08:59:13.8973254Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T08:59:13.8973938Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T08:59:13.8974487Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T08:59:13.8974911Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T08:59:13.8975279Z [36;1m    break[0m
2025-12-04T08:59:13.8975511Z [36;1m  else[0m
2025-12-04T08:59:13.8975870Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T08:59:13.8976401Z [36;1m    sleep 300[0m
2025-12-04T08:59:13.8976656Z [36;1m  fi[0m
2025-12-04T08:59:13.8977070Z [36;1mdone[0m
2025-12-04T08:59:13.8977314Z [36;1m[0m
2025-12-04T08:59:13.8977724Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T08:59:13.8978530Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T08:59:13.8979144Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T08:59:13.8979679Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T08:59:13.8980136Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T08:59:13.8980503Z [36;1melse[0m
2025-12-04T08:59:13.8980880Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T08:59:13.8981437Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T08:59:13.8981846Z [36;1mfi[0m
2025-12-04T08:59:13.8982087Z [36;1m[0m
2025-12-04T08:59:13.8982351Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T08:59:13.8982755Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8983140Z [36;1m[0m
2025-12-04T08:59:13.8983684Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T08:59:13.8984327Z [36;1m  exit 0[0m
2025-12-04T08:59:13.8984586Z [36;1mfi[0m
2025-12-04T08:59:13.8984821Z [36;1m[0m
2025-12-04T08:59:13.8985172Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T08:59:13.8985962Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T08:59:13.8986653Z [36;1m  exit 1[0m
2025-12-04T08:59:13.8986907Z [36;1mfi[0m
2025-12-04T08:59:13.8987129Z [36;1m[0m
2025-12-04T08:59:13.8987547Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T08:59:13.8988316Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T08:59:13.8989230Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T08:59:13.8989929Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T08:59:13.8990743Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T08:59:13.8991280Z [36;1mfi[0m
2025-12-04T08:59:13.8991483Z [36;1m[0m
2025-12-04T08:59:13.8991752Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:59:13.8997189Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:13.8997583Z env:
2025-12-04T08:59:13.8997797Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:13.8998081Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T08:59:13.8998439Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:59:13.8999385Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.9000562Z   DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:13.9001265Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:13.9001673Z   DOCKER_PUSH: 
2025-12-04T08:59:13.9001901Z ##[endgroup]
2025-12-04T08:59:13.9025713Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:13.9026230Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:13.9028394Z + aws ecr get-login-password --region us-east-1
2025-12-04T08:59:13.9029765Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:14.4339967Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:59:14.4340679Z Configure a credential helper to remove this warning. See
2025-12-04T08:59:14.4341335Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:59:14.4341805Z 
2025-12-04T08:59:14.4342269Z Login Succeeded
2025-12-04T08:59:14.4359812Z ++ date +%s
2025-12-04T08:59:14.4367687Z + START_TIME=1764838754
2025-12-04T08:59:14.4371635Z ++ date +%s
2025-12-04T08:59:14.4381251Z + [[ 1764831554 -lt 1764838754 ]]
2025-12-04T08:59:14.4382359Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:14.6356145Z {
2025-12-04T08:59:14.6356484Z 	"schemaVersion": 2,
2025-12-04T08:59:14.6356976Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T08:59:14.6357649Z 	"config": {
2025-12-04T08:59:14.6358038Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T08:59:14.6358512Z 		"size": 34864,
2025-12-04T08:59:14.6358983Z 		"digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301"
2025-12-04T08:59:14.6359520Z 	},
2025-12-04T08:59:14.6359744Z 	"layers": [
2025-12-04T08:59:14.6360088Z 		{
2025-12-04T08:59:14.6360447Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6360924Z 			"size": 30447951,
2025-12-04T08:59:14.6361422Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T08:59:14.6361951Z 		},
2025-12-04T08:59:14.6362164Z 		{
2025-12-04T08:59:14.6362579Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6363048Z 			"size": 1554,
2025-12-04T08:59:14.6363502Z 			"digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a"
2025-12-04T08:59:14.6364021Z 		},
2025-12-04T08:59:14.6364233Z 		{
2025-12-04T08:59:14.6364585Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6365054Z 			"size": 313275661,
2025-12-04T08:59:14.6365543Z 			"digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c"
2025-12-04T08:59:14.6366071Z 		},
2025-12-04T08:59:14.6366280Z 		{
2025-12-04T08:59:14.6366641Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6367094Z 			"size": 787,
2025-12-04T08:59:14.6367555Z 			"digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f"
2025-12-04T08:59:14.6368092Z 		},
2025-12-04T08:59:14.6368527Z 		{
2025-12-04T08:59:14.6368891Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6369359Z 			"size": 106,
2025-12-04T08:59:14.6369811Z 			"digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234"
2025-12-04T08:59:14.6370399Z 		},
2025-12-04T08:59:14.6370610Z 		{
2025-12-04T08:59:14.6370971Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6371422Z 			"size": 703,
2025-12-04T08:59:14.6371874Z 			"digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af"
2025-12-04T08:59:14.6372401Z 		},
2025-12-04T08:59:14.6372597Z 		{
2025-12-04T08:59:14.6372957Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6373423Z 			"size": 1216,
2025-12-04T08:59:14.6373864Z 			"digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0"
2025-12-04T08:59:14.6374389Z 		},
2025-12-04T08:59:14.6374608Z 		{
2025-12-04T08:59:14.6374959Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6375421Z 			"size": 483,
2025-12-04T08:59:14.6375866Z 			"digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628"
2025-12-04T08:59:14.6376500Z 		},
2025-12-04T08:59:14.6376703Z 		{
2025-12-04T08:59:14.6377238Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6377723Z 			"size": 110361875,
2025-12-04T08:59:14.6378187Z 			"digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77"
2025-12-04T08:59:14.6378726Z 		},
2025-12-04T08:59:14.6378942Z 		{
2025-12-04T08:59:14.6379301Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6379780Z 			"size": 4961,
2025-12-04T08:59:14.6380253Z 			"digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2"
2025-12-04T08:59:14.6380785Z 		},
2025-12-04T08:59:14.6381005Z 		{
2025-12-04T08:59:14.6381514Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6381993Z 			"size": 1755,
2025-12-04T08:59:14.6382464Z 			"digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15"
2025-12-04T08:59:14.6383013Z 		},
2025-12-04T08:59:14.6383231Z 		{
2025-12-04T08:59:14.6383604Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6384083Z 			"size": 724,
2025-12-04T08:59:14.6384551Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:59:14.6385077Z 		},
2025-12-04T08:59:14.6385299Z 		{
2025-12-04T08:59:14.6385674Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6386141Z 			"size": 543,
2025-12-04T08:59:14.6386612Z 			"digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066"
2025-12-04T08:59:14.6387162Z 		},
2025-12-04T08:59:14.6387385Z 		{
2025-12-04T08:59:14.6387753Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6388377Z 			"size": 3185190117,
2025-12-04T08:59:14.6388872Z 			"digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735"
2025-12-04T08:59:14.6389406Z 		},
2025-12-04T08:59:14.6389623Z 		{
2025-12-04T08:59:14.6389993Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6390444Z 			"size": 32,
2025-12-04T08:59:14.6390904Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6391449Z 		},
2025-12-04T08:59:14.6391647Z 		{
2025-12-04T08:59:14.6392008Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6392473Z 			"size": 396,
2025-12-04T08:59:14.6392918Z 			"digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5"
2025-12-04T08:59:14.6393428Z 		},
2025-12-04T08:59:14.6393636Z 		{
2025-12-04T08:59:14.6393995Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6394455Z 			"size": 236860,
2025-12-04T08:59:14.6394999Z 			"digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee"
2025-12-04T08:59:14.6395523Z 		},
2025-12-04T08:59:14.6395718Z 		{
2025-12-04T08:59:14.6396079Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6396642Z + exit 0
2025-12-04T08:59:14.6396855Z 			"size": 231,
2025-12-04T08:59:14.6397315Z 			"digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6"
2025-12-04T08:59:14.6397844Z 		},
2025-12-04T08:59:14.6398057Z 		{
2025-12-04T08:59:14.6398405Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6398871Z 			"size": 3043497,
2025-12-04T08:59:14.6399334Z 			"digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035"
2025-12-04T08:59:14.6399847Z 		},
2025-12-04T08:59:14.6400058Z 		{
2025-12-04T08:59:14.6400418Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6400875Z 			"size": 1472,
2025-12-04T08:59:14.6401345Z 			"digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42"
2025-12-04T08:59:14.6401877Z 		},
2025-12-04T08:59:14.6402074Z 		{
2025-12-04T08:59:14.6402432Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6402898Z 			"size": 481,
2025-12-04T08:59:14.6403340Z 			"digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979"
2025-12-04T08:59:14.6403870Z 		},
2025-12-04T08:59:14.6404080Z 		{
2025-12-04T08:59:14.6404442Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6404895Z 			"size": 202,
2025-12-04T08:59:14.6405356Z 			"digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb"
2025-12-04T08:59:14.6405887Z 		},
2025-12-04T08:59:14.6406083Z 		{
2025-12-04T08:59:14.6406443Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6406905Z 			"size": 607,
2025-12-04T08:59:14.6407434Z 			"digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5"
2025-12-04T08:59:14.6407986Z 		},
2025-12-04T08:59:14.6408197Z 		{
2025-12-04T08:59:14.6408548Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6409017Z 			"size": 7889619584,
2025-12-04T08:59:14.6409494Z 			"digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b"
2025-12-04T08:59:14.6410027Z 		},
2025-12-04T08:59:14.6410226Z 		{
2025-12-04T08:59:14.6410588Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6411053Z 			"size": 830,
2025-12-04T08:59:14.6411493Z 			"digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b"
2025-12-04T08:59:14.6412014Z 		},
2025-12-04T08:59:14.6412227Z 		{
2025-12-04T08:59:14.6412575Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6413038Z 			"size": 33451739,
2025-12-04T08:59:14.6413512Z 			"digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a"
2025-12-04T08:59:14.6414029Z 		},
2025-12-04T08:59:14.6414241Z 		{
2025-12-04T08:59:14.6414600Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6415053Z 			"size": 104,
2025-12-04T08:59:14.6415515Z 			"digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb"
2025-12-04T08:59:14.6416050Z 		},
2025-12-04T08:59:14.6416261Z 		{
2025-12-04T08:59:14.6416868Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6417385Z 			"size": 1496,
2025-12-04T08:59:14.6417853Z 			"digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3"
2025-12-04T08:59:14.6418375Z 		},
2025-12-04T08:59:14.6418593Z 		{
2025-12-04T08:59:14.6418964Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6419433Z 			"size": 458787828,
2025-12-04T08:59:14.6419921Z 			"digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6"
2025-12-04T08:59:14.6420542Z 		},
2025-12-04T08:59:14.6420956Z 		{
2025-12-04T08:59:14.6421341Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6421823Z 			"size": 164,
2025-12-04T08:59:14.6422280Z 			"digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98"
2025-12-04T08:59:14.6422831Z 		},
2025-12-04T08:59:14.6423052Z 		{
2025-12-04T08:59:14.6423432Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6423900Z 			"size": 346,
2025-12-04T08:59:14.6424366Z 			"digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35"
2025-12-04T08:59:14.6424910Z 		},
2025-12-04T08:59:14.6425119Z 		{
2025-12-04T08:59:14.6425494Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6425975Z 			"size": 32,
2025-12-04T08:59:14.6426435Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6426997Z 		},
2025-12-04T08:59:14.6427222Z 		{
2025-12-04T08:59:14.6427584Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6428067Z 			"size": 106,
2025-12-04T08:59:14.6428545Z 			"digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52"
2025-12-04T08:59:14.6429103Z 		},
2025-12-04T08:59:14.6429303Z 		{
2025-12-04T08:59:14.6429674Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6430158Z 			"size": 424,
2025-12-04T08:59:14.6430622Z 			"digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e"
2025-12-04T08:59:14.6431173Z 		},
2025-12-04T08:59:14.6431388Z 		{
2025-12-04T08:59:14.6431746Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6432226Z 			"size": 19309374,
2025-12-04T08:59:14.6432813Z 			"digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50"
2025-12-04T08:59:14.6433318Z 		},
2025-12-04T08:59:14.6433650Z 		{
2025-12-04T08:59:14.6434012Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6434450Z 			"size": 108,
2025-12-04T08:59:14.6434892Z 			"digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e"
2025-12-04T08:59:14.6435402Z 		},
2025-12-04T08:59:14.6435605Z 		{
2025-12-04T08:59:14.6435941Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6436392Z 			"size": 827,
2025-12-04T08:59:14.6436830Z 			"digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e"
2025-12-04T08:59:14.6437334Z 		},
2025-12-04T08:59:14.6437543Z 		{
2025-12-04T08:59:14.6437891Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6459825Z 			"size": 724,
2025-12-04T08:59:14.6460321Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:59:14.6460876Z 		},
2025-12-04T08:59:14.6461098Z 		{
2025-12-04T08:59:14.6461484Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6461978Z 			"size": 149,
2025-12-04T08:59:14.6462445Z 			"digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a"
2025-12-04T08:59:14.6462968Z 		},
2025-12-04T08:59:14.6463186Z 		{
2025-12-04T08:59:14.6463562Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6464035Z 			"size": 138,
2025-12-04T08:59:14.6464505Z 			"digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d"
2025-12-04T08:59:14.6465051Z 		},
2025-12-04T08:59:14.6465267Z 		{
2025-12-04T08:59:14.6465625Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6466102Z 			"size": 141,
2025-12-04T08:59:14.6466566Z 			"digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099"
2025-12-04T08:59:14.6467094Z 		},
2025-12-04T08:59:14.6467308Z 		{
2025-12-04T08:59:14.6467684Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6468339Z 			"size": 32,
2025-12-04T08:59:14.6468915Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6469547Z 		},
2025-12-04T08:59:14.6469737Z 		{
2025-12-04T08:59:14.6470089Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6470539Z 			"size": 223,
2025-12-04T08:59:14.6470973Z 			"digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503"
2025-12-04T08:59:14.6471489Z 		},
2025-12-04T08:59:14.6471694Z 		{
2025-12-04T08:59:14.6472164Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6472580Z 			"size": 255,
2025-12-04T08:59:14.6472994Z 			"digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241"
2025-12-04T08:59:14.6473475Z 		},
2025-12-04T08:59:14.6473656Z 		{
2025-12-04T08:59:14.6473986Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6474424Z 			"size": 145520672,
2025-12-04T08:59:14.6474849Z 			"digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7"
2025-12-04T08:59:14.6475331Z 		},
2025-12-04T08:59:14.6475527Z 		{
2025-12-04T08:59:14.6475846Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6476276Z 			"size": 106,
2025-12-04T08:59:14.6476685Z 			"digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784"
2025-12-04T08:59:14.6477160Z 		},
2025-12-04T08:59:14.6477345Z 		{
2025-12-04T08:59:14.6477680Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6478118Z 			"size": 312293530,
2025-12-04T08:59:14.6478573Z 			"digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b"
2025-12-04T08:59:14.6479067Z 		},
2025-12-04T08:59:14.6479264Z 		{
2025-12-04T08:59:14.6479587Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6480022Z 			"size": 3058011133,
2025-12-04T08:59:14.6480574Z 			"digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a"
2025-12-04T08:59:14.6481053Z 		},
2025-12-04T08:59:14.6481254Z 		{
2025-12-04T08:59:14.6481592Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6482010Z 			"size": 129,
2025-12-04T08:59:14.6482440Z 			"digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287"
2025-12-04T08:59:14.6482941Z 		},
2025-12-04T08:59:14.6483144Z 		{
2025-12-04T08:59:14.6483466Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6483898Z 			"size": 880,
2025-12-04T08:59:14.6484317Z 			"digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5"
2025-12-04T08:59:14.6484794Z 		},
2025-12-04T08:59:14.6484990Z 		{
2025-12-04T08:59:14.6485323Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6485737Z 			"size": 724,
2025-12-04T08:59:14.6486149Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:59:14.6486778Z 		},
2025-12-04T08:59:14.6486959Z 		{
2025-12-04T08:59:14.6487288Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6487712Z 			"size": 139,
2025-12-04T08:59:14.6488125Z 			"digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c"
2025-12-04T08:59:14.6488597Z 		},
2025-12-04T08:59:14.6488794Z 		{
2025-12-04T08:59:14.6489122Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6489535Z 			"size": 32,
2025-12-04T08:59:14.6489954Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6490441Z 		},
2025-12-04T08:59:14.6490621Z 		{
2025-12-04T08:59:14.6490948Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6491369Z 			"size": 159,
2025-12-04T08:59:14.6491781Z 			"digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3"
2025-12-04T08:59:14.6492335Z 		},
2025-12-04T08:59:14.6492528Z 		{
2025-12-04T08:59:14.6492850Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6493282Z 			"size": 1011,
2025-12-04T08:59:14.6493709Z 			"digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd"
2025-12-04T08:59:14.6494199Z 		},
2025-12-04T08:59:14.6494380Z 		{
2025-12-04T08:59:14.6494710Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6495136Z 			"size": 724,
2025-12-04T08:59:14.6495532Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:59:14.6496007Z 		},
2025-12-04T08:59:14.6496198Z 		{
2025-12-04T08:59:14.6496619Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6497259Z 			"size": 134,
2025-12-04T08:59:14.6497762Z 			"digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea"
2025-12-04T08:59:14.6498299Z 		},
2025-12-04T08:59:14.6498517Z 		{
2025-12-04T08:59:14.6498894Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6499364Z 			"size": 32,
2025-12-04T08:59:14.6499828Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6500373Z 		},
2025-12-04T08:59:14.6500588Z 		{
2025-12-04T08:59:14.6500946Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6501425Z 			"size": 157,
2025-12-04T08:59:14.6501909Z 			"digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8"
2025-12-04T08:59:14.6502454Z 		},
2025-12-04T08:59:14.6502669Z 		{
2025-12-04T08:59:14.6503038Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6503502Z 			"size": 602,
2025-12-04T08:59:14.6503970Z 			"digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc"
2025-12-04T08:59:14.6504515Z 		},
2025-12-04T08:59:14.6504716Z 		{
2025-12-04T08:59:14.6505166Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6505651Z 			"size": 724,
2025-12-04T08:59:14.6506095Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:59:14.6506629Z 		},
2025-12-04T08:59:14.6506846Z 		{
2025-12-04T08:59:14.6507218Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6507687Z 			"size": 155,
2025-12-04T08:59:14.6508158Z 			"digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f"
2025-12-04T08:59:14.6508703Z 		},
2025-12-04T08:59:14.6508904Z 		{
2025-12-04T08:59:14.6509348Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6509774Z 			"size": 32,
2025-12-04T08:59:14.6510178Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6510665Z 		},
2025-12-04T08:59:14.6510856Z 		{
2025-12-04T08:59:14.6511178Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6511610Z 			"size": 188,
2025-12-04T08:59:14.6512032Z 			"digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539"
2025-12-04T08:59:14.6512523Z 		},
2025-12-04T08:59:14.6512704Z 		{
2025-12-04T08:59:14.6513034Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6513461Z 			"size": 1370,
2025-12-04T08:59:14.6513873Z 			"digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694"
2025-12-04T08:59:14.6514361Z 		},
2025-12-04T08:59:14.6514547Z 		{
2025-12-04T08:59:14.6514872Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6515288Z 			"size": 32,
2025-12-04T08:59:14.6515688Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6516168Z 		},
2025-12-04T08:59:14.6516348Z 		{
2025-12-04T08:59:14.6516657Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6517073Z 			"size": 136,
2025-12-04T08:59:14.6517546Z 			"digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d"
2025-12-04T08:59:14.6518013Z 		},
2025-12-04T08:59:14.6518193Z 		{
2025-12-04T08:59:14.6518511Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6518919Z 			"size": 528,
2025-12-04T08:59:14.6519331Z 			"digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8"
2025-12-04T08:59:14.6519810Z 		},
2025-12-04T08:59:14.6519991Z 		{
2025-12-04T08:59:14.6520302Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6520714Z 			"size": 32,
2025-12-04T08:59:14.6521465Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6522043Z 		},
2025-12-04T08:59:14.6522245Z 		{
2025-12-04T08:59:14.6522605Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6523066Z 			"size": 104,
2025-12-04T08:59:14.6523539Z 			"digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4"
2025-12-04T08:59:14.6524085Z 		},
2025-12-04T08:59:14.6524280Z 		{
2025-12-04T08:59:14.6524634Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6525091Z 			"size": 435,
2025-12-04T08:59:14.6525528Z 			"digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44"
2025-12-04T08:59:14.6526044Z 		},
2025-12-04T08:59:14.6526241Z 		{
2025-12-04T08:59:14.6526589Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6527041Z 			"size": 32,
2025-12-04T08:59:14.6527490Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6528021Z 		},
2025-12-04T08:59:14.6528209Z 		{
2025-12-04T08:59:14.6528557Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6529017Z 			"size": 109,
2025-12-04T08:59:14.6529579Z 			"digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d"
2025-12-04T08:59:14.6530118Z 		},
2025-12-04T08:59:14.6530321Z 		{
2025-12-04T08:59:14.6530668Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6531134Z 			"size": 1896,
2025-12-04T08:59:14.6531588Z 			"digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352"
2025-12-04T08:59:14.6532118Z 		},
2025-12-04T08:59:14.6532311Z 		{
2025-12-04T08:59:14.6532665Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6533130Z 			"size": 197526165,
2025-12-04T08:59:14.6533781Z 			"digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44"
2025-12-04T08:59:14.6534249Z 		},
2025-12-04T08:59:14.6534430Z 		{
2025-12-04T08:59:14.6534736Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6535150Z 			"size": 106,
2025-12-04T08:59:14.6535555Z 			"digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3"
2025-12-04T08:59:14.6536019Z 		},
2025-12-04T08:59:14.6536199Z 		{
2025-12-04T08:59:14.6536595Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6537211Z 			"size": 165,
2025-12-04T08:59:14.6537748Z 			"digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb"
2025-12-04T08:59:14.6538284Z 		},
2025-12-04T08:59:14.6538493Z 		{
2025-12-04T08:59:14.6538844Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6539307Z 			"size": 7944,
2025-12-04T08:59:14.6539765Z 			"digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760"
2025-12-04T08:59:14.6540285Z 		},
2025-12-04T08:59:14.6540480Z 		{
2025-12-04T08:59:14.6540838Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6541292Z 			"size": 8071,
2025-12-04T08:59:14.6541751Z 			"digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea"
2025-12-04T08:59:14.6542284Z 		},
2025-12-04T08:59:14.6542480Z 		{
2025-12-04T08:59:14.6542932Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6543403Z 			"size": 304,
2025-12-04T08:59:14.6543850Z 			"digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26"
2025-12-04T08:59:14.6544380Z 		},
2025-12-04T08:59:14.6544587Z 		{
2025-12-04T08:59:14.6544941Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6545400Z 			"size": 13364291,
2025-12-04T08:59:14.6545874Z 			"digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6"
2025-12-04T08:59:14.6546406Z 		},
2025-12-04T08:59:14.6546596Z 		{
2025-12-04T08:59:14.6546947Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6547402Z 			"size": 108,
2025-12-04T08:59:14.6547853Z 			"digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600"
2025-12-04T08:59:14.6548388Z 		},
2025-12-04T08:59:14.6548695Z 		{
2025-12-04T08:59:14.6549030Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6549557Z 			"size": 54145699,
2025-12-04T08:59:14.6549980Z 			"digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773"
2025-12-04T08:59:14.6550451Z 		},
2025-12-04T08:59:14.6550619Z 		{
2025-12-04T08:59:14.6550929Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:59:14.6551335Z 			"size": 32,
2025-12-04T08:59:14.6551728Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:59:14.6552199Z 		}
2025-12-04T08:59:14.6552374Z 	]
2025-12-04T08:59:14.6552543Z }
2025-12-04T08:59:14.6579052Z ##[group]Run set -eux
2025-12-04T08:59:14.6579369Z [36;1mset -eux[0m
2025-12-04T08:59:14.6579848Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T08:59:14.6581300Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T08:59:14.6588695Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:14.6589238Z env:
2025-12-04T08:59:14.6589451Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:14.6589722Z ##[endgroup]
2025-12-04T08:59:14.6617757Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T08:59:14.6618368Z + jq --raw-output .SecretString
2025-12-04T08:59:14.6619288Z + jq -r .docker_hub_readonly_token
2025-12-04T08:59:14.6620283Z + docker login --username pytorchbot --password-stdin
2025-12-04T08:59:15.2324117Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:59:15.2324852Z Configure a credential helper to remove this warning. See
2025-12-04T08:59:15.2325533Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:59:15.2325997Z 
2025-12-04T08:59:15.2326145Z Login Succeeded
2025-12-04T08:59:15.2415382Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:}
2025-12-04T08:59:15.2415789Z [36;1mtag=${ECR_DOCKER_IMAGE##*:}[0m
2025-12-04T08:59:15.2416232Z [36;1mecho "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}"[0m
2025-12-04T08:59:15.2423641Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:15.2424080Z env:
2025-12-04T08:59:15.2424333Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:15.2425315Z   ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.2426309Z ##[endgroup]
2025-12-04T08:59:15.2453778Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.2502853Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T08:59:15.2503315Z with:
2025-12-04T08:59:15.2504111Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.2505246Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:15.2505655Z env:
2025-12-04T08:59:15.2505866Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:15.2506143Z ##[endgroup]
2025-12-04T08:59:15.2520503Z ##[group]Run set -x
2025-12-04T08:59:15.2520965Z [36;1mset -x[0m
2025-12-04T08:59:15.2521364Z [36;1mset +e[0m
2025-12-04T08:59:15.2521623Z [36;1m[0m
2025-12-04T08:59:15.2521872Z [36;1mlogin() {[0m
2025-12-04T08:59:15.2522419Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T08:59:15.2523035Z [36;1m}[0m
2025-12-04T08:59:15.2523278Z [36;1m[0m
2025-12-04T08:59:15.2523558Z [36;1mretry () {[0m
2025-12-04T08:59:15.2523864Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T08:59:15.2524234Z [36;1m}[0m
2025-12-04T08:59:15.2524479Z [36;1m[0m
2025-12-04T08:59:15.2524759Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T08:59:15.2525106Z [36;1m[0m
2025-12-04T08:59:15.2525674Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T08:59:15.2526454Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T08:59:15.2526877Z [36;1m[0m
2025-12-04T08:59:15.2527113Z [36;1mset -e[0m
2025-12-04T08:59:15.2527600Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T08:59:15.2528119Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T08:59:15.2528676Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T08:59:15.2529209Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T08:59:15.2529543Z [36;1mfi[0m
2025-12-04T08:59:15.2534980Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:59:15.2535380Z env:
2025-12-04T08:59:15.2535609Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:59:15.2536579Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.2537848Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:15.2538304Z ##[endgroup]
2025-12-04T08:59:15.2563481Z + set +e
2025-12-04T08:59:15.2564025Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:15.2564514Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:15.2567917Z + aws ecr get-login-password --region us-east-1
2025-12-04T08:59:15.2568845Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:59:15.7876858Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:59:15.7877557Z Configure a credential helper to remove this warning. See
2025-12-04T08:59:15.7878220Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:59:15.7878702Z 
2025-12-04T08:59:15.7878820Z Login Succeeded
2025-12-04T08:59:15.7900336Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.7901456Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T08:59:15.9818499Z + IMAGE_SIZE=15091.581844329834
2025-12-04T08:59:15.9818961Z + echo 'Compressed size of image in MB: 15091.581844329834'
2025-12-04T08:59:15.9819384Z + set -e
2025-12-04T08:59:15.9819691Z Compressed size of image in MB: 15091.581844329834
2025-12-04T08:59:15.9821306Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.9939521Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:15.9941415Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:59:16.2148805Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image
2025-12-04T08:59:16.2151637Z 63e5bc7682b8: Pulling fs layer
2025-12-04T08:59:16.2151967Z 0678d56345c9: Pulling fs layer
2025-12-04T08:59:16.2152316Z 45f5c9ddfce7: Pulling fs layer
2025-12-04T08:59:16.2152622Z 086b1df51ac1: Pulling fs layer
2025-12-04T08:59:16.2152938Z fe8a7b64bf98: Pulling fs layer
2025-12-04T08:59:16.2153252Z 7680723e9a57: Pulling fs layer
2025-12-04T08:59:16.2153568Z 9c5027aeeb4e: Pulling fs layer
2025-12-04T08:59:16.2153879Z 9a5652110360: Pulling fs layer
2025-12-04T08:59:16.2154185Z 375c4427e914: Pulling fs layer
2025-12-04T08:59:16.2154487Z a86faaa7dbdd: Pulling fs layer
2025-12-04T08:59:16.2154809Z fb7848686804: Pulling fs layer
2025-12-04T08:59:16.2155135Z 3541df015cdb: Pulling fs layer
2025-12-04T08:59:16.2155436Z 79dc80f426b2: Pulling fs layer
2025-12-04T08:59:16.2155752Z a13fcc1b90bb: Pulling fs layer
2025-12-04T08:59:16.2156071Z 4f4fb700ef54: Pulling fs layer
2025-12-04T08:59:16.2156371Z 549db4d6c618: Pulling fs layer
2025-12-04T08:59:16.2156685Z 5c63528cb580: Pulling fs layer
2025-12-04T08:59:16.2156995Z 75bd83b989a4: Pulling fs layer
2025-12-04T08:59:16.2157308Z de6e78970f51: Pulling fs layer
2025-12-04T08:59:16.2157606Z e13ed7c7e473: Pulling fs layer
2025-12-04T08:59:16.2157920Z 6e2949bcb741: Pulling fs layer
2025-12-04T08:59:16.2158417Z 14d69d9aaec7: Pulling fs layer
2025-12-04T08:59:16.2158726Z 5c02769dd8e5: Pulling fs layer
2025-12-04T08:59:16.2159051Z 35041ce524ac: Pulling fs layer
2025-12-04T08:59:16.2159364Z fe8a7b64bf98: Waiting
2025-12-04T08:59:16.2159643Z 2fa92dc5885e: Pulling fs layer
2025-12-04T08:59:16.2159967Z 2b85eafbd92a: Pulling fs layer
2025-12-04T08:59:16.2160279Z 086b1df51ac1: Waiting
2025-12-04T08:59:16.2160565Z ff755a4ddad7: Pulling fs layer
2025-12-04T08:59:16.2160870Z 7680723e9a57: Waiting
2025-12-04T08:59:16.2161173Z 09eb41bdf42d: Pulling fs layer
2025-12-04T08:59:16.2161487Z 9c5027aeeb4e: Waiting
2025-12-04T08:59:16.2161748Z 9a5652110360: Waiting
2025-12-04T08:59:16.2162019Z 3541df015cdb: Waiting
2025-12-04T08:59:16.2162350Z 11ede4d59e93: Pulling fs layer
2025-12-04T08:59:16.2162666Z 375c4427e914: Waiting
2025-12-04T08:59:16.2162968Z 1283cd8f801a: Pulling fs layer
2025-12-04T08:59:16.2163295Z 024fa855425f: Pulling fs layer
2025-12-04T08:59:16.2163619Z 303e6747a62e: Pulling fs layer
2025-12-04T08:59:16.2164035Z e13ed7c7e473: Waiting
2025-12-04T08:59:16.2164307Z 79dc80f426b2: Waiting
2025-12-04T08:59:16.2164580Z a86faaa7dbdd: Waiting
2025-12-04T08:59:16.2164857Z 3017cdf4838b: Pulling fs layer
2025-12-04T08:59:16.2165163Z fb7848686804: Waiting
2025-12-04T08:59:16.2165443Z 6b6cd1c358e8: Pulling fs layer
2025-12-04T08:59:16.2165739Z 6e2949bcb741: Waiting
2025-12-04T08:59:16.2166021Z b2dd04501124: Pulling fs layer
2025-12-04T08:59:16.2166333Z 14d69d9aaec7: Waiting
2025-12-04T08:59:16.2166615Z 55adc51fe589: Pulling fs layer
2025-12-04T08:59:16.2166903Z de6e78970f51: Waiting
2025-12-04T08:59:16.2167167Z 75bd83b989a4: Waiting
2025-12-04T08:59:16.2167428Z 35041ce524ac: Waiting
2025-12-04T08:59:16.2167679Z a13fcc1b90bb: Waiting
2025-12-04T08:59:16.2167941Z 5c63528cb580: Waiting
2025-12-04T08:59:16.2168215Z a43ca0e4b837: Pulling fs layer
2025-12-04T08:59:16.2168501Z 4f4fb700ef54: Waiting
2025-12-04T08:59:16.2168761Z 5c02769dd8e5: Waiting
2025-12-04T08:59:16.2169024Z 2fa92dc5885e: Waiting
2025-12-04T08:59:16.2169285Z b7212f17fd14: Pulling fs layer
2025-12-04T08:59:16.2169585Z 2b85eafbd92a: Waiting
2025-12-04T08:59:16.2169852Z 1283cd8f801a: Waiting
2025-12-04T08:59:16.2170336Z 09eb41bdf42d: Waiting
2025-12-04T08:59:16.2170624Z 083e42cac090: Pulling fs layer
2025-12-04T08:59:16.2170928Z 11ede4d59e93: Waiting
2025-12-04T08:59:16.2171182Z ff755a4ddad7: Waiting
2025-12-04T08:59:16.2171451Z 024fa855425f: Waiting
2025-12-04T08:59:16.2171824Z b2dd04501124: Waiting
2025-12-04T08:59:16.2172087Z 0a00b784a4aa: Pulling fs layer
2025-12-04T08:59:16.2172421Z 549db4d6c618: Waiting
2025-12-04T08:59:16.2172682Z a43ca0e4b837: Waiting
2025-12-04T08:59:16.2172928Z 3017cdf4838b: Waiting
2025-12-04T08:59:16.2173191Z 55adc51fe589: Waiting
2025-12-04T08:59:16.2173465Z c6173c779f7b: Pulling fs layer
2025-12-04T08:59:16.2173766Z 6b6cd1c358e8: Waiting
2025-12-04T08:59:16.2174014Z 303e6747a62e: Waiting
2025-12-04T08:59:16.2174272Z b7212f17fd14: Waiting
2025-12-04T08:59:16.2174534Z 0a00b784a4aa: Waiting
2025-12-04T08:59:16.2174799Z ed3d1e3387b9: Pulling fs layer
2025-12-04T08:59:16.2175102Z 083e42cac090: Waiting
2025-12-04T08:59:16.2175374Z b29343478586: Pulling fs layer
2025-12-04T08:59:16.2175666Z ed3d1e3387b9: Waiting
2025-12-04T08:59:16.2175943Z c6f0520487fb: Pulling fs layer
2025-12-04T08:59:16.2176253Z 148171691cd4: Pulling fs layer
2025-12-04T08:59:16.2176674Z 2c666d30ed77: Pulling fs layer
2025-12-04T08:59:16.2177180Z 5d8d3a0a98e0: Pulling fs layer
2025-12-04T08:59:16.2177516Z b06bafce9e81: Pulling fs layer
2025-12-04T08:59:16.2177828Z 15e0d7e4590d: Pulling fs layer
2025-12-04T08:59:16.2178140Z 2c666d30ed77: Waiting
2025-12-04T08:59:16.2178425Z a514bd1add31: Pulling fs layer
2025-12-04T08:59:16.2178723Z 5d8d3a0a98e0: Waiting
2025-12-04T08:59:16.2179008Z 57b84ee60002: Pulling fs layer
2025-12-04T08:59:16.2179307Z b29343478586: Waiting
2025-12-04T08:59:16.2179573Z 15e0d7e4590d: Waiting
2025-12-04T08:59:16.2179824Z 148171691cd4: Waiting
2025-12-04T08:59:16.2180095Z a514bd1add31: Waiting
2025-12-04T08:59:16.2180385Z b8babeff6d81: Pulling fs layer
2025-12-04T08:59:16.2180697Z 83779ddf6a85: Pulling fs layer
2025-12-04T08:59:16.2181002Z 57b84ee60002: Waiting
2025-12-04T08:59:16.2181274Z b06bafce9e81: Waiting
2025-12-04T08:59:16.2181545Z b8babeff6d81: Waiting
2025-12-04T08:59:16.2181828Z 8b7620c0d736: Pulling fs layer
2025-12-04T08:59:16.2182154Z 3bcfa090e4ef: Pulling fs layer
2025-12-04T08:59:16.2182464Z eb0504ec4d92: Pulling fs layer
2025-12-04T08:59:16.2182774Z 83779ddf6a85: Waiting
2025-12-04T08:59:16.2183043Z 8b7620c0d736: Waiting
2025-12-04T08:59:16.2183297Z eb0504ec4d92: Waiting
2025-12-04T08:59:16.2183582Z 15d0fec09d7b: Pulling fs layer
2025-12-04T08:59:16.2183909Z cca81fcc62a9: Pulling fs layer
2025-12-04T08:59:16.2184226Z b0b8f9b5c6ab: Pulling fs layer
2025-12-04T08:59:16.2184548Z 0606ca4d47a8: Pulling fs layer
2025-12-04T08:59:16.2184864Z 15d0fec09d7b: Waiting
2025-12-04T08:59:16.2185126Z 0606ca4d47a8: Waiting
2025-12-04T08:59:16.2185402Z cca81fcc62a9: Waiting
2025-12-04T08:59:16.2185692Z 2f80a4e1b3b9: Pulling fs layer
2025-12-04T08:59:16.2186015Z 35c916fb1bd0: Pulling fs layer
2025-12-04T08:59:16.2186324Z 195537b7dafc: Pulling fs layer
2025-12-04T08:59:16.2186633Z 2f80a4e1b3b9: Waiting
2025-12-04T08:59:16.2186922Z dc454fd3967e: Pulling fs layer
2025-12-04T08:59:16.2187219Z 35c916fb1bd0: Waiting
2025-12-04T08:59:16.2187490Z 195537b7dafc: Waiting
2025-12-04T08:59:16.2187771Z 701b34f115fa: Pulling fs layer
2025-12-04T08:59:16.2188086Z 39cefc00ffed: Pulling fs layer
2025-12-04T08:59:16.2188526Z 6ae51eb61a32: Pulling fs layer
2025-12-04T08:59:16.2188826Z dc454fd3967e: Waiting
2025-12-04T08:59:16.2189076Z 701b34f115fa: Waiting
2025-12-04T08:59:16.2189339Z 39cefc00ffed: Waiting
2025-12-04T08:59:16.2189607Z 6ae51eb61a32: Waiting
2025-12-04T08:59:16.2189874Z 1fd5341e66df: Pulling fs layer
2025-12-04T08:59:16.2190191Z 72a7c87e35e4: Pulling fs layer
2025-12-04T08:59:16.2190509Z ec36862ac98e: Pulling fs layer
2025-12-04T08:59:16.2190805Z 1fd5341e66df: Waiting
2025-12-04T08:59:16.2191072Z 72a7c87e35e4: Waiting
2025-12-04T08:59:16.2191354Z 05ddbf246e8a: Pulling fs layer
2025-12-04T08:59:16.2191648Z ec36862ac98e: Waiting
2025-12-04T08:59:16.2191919Z 05ddbf246e8a: Waiting
2025-12-04T08:59:16.3138845Z 0678d56345c9: Download complete
2025-12-04T08:59:16.3857980Z 086b1df51ac1: Download complete
2025-12-04T08:59:16.4618803Z fe8a7b64bf98: Verifying Checksum
2025-12-04T08:59:16.4619200Z fe8a7b64bf98: Download complete
2025-12-04T08:59:16.5371005Z 7680723e9a57: Verifying Checksum
2025-12-04T08:59:16.5371668Z 7680723e9a57: Download complete
2025-12-04T08:59:16.5660555Z 63e5bc7682b8: Download complete
2025-12-04T08:59:16.6267646Z 9c5027aeeb4e: Download complete
2025-12-04T08:59:16.6376534Z 9a5652110360: Verifying Checksum
2025-12-04T08:59:16.6377159Z 9a5652110360: Download complete
2025-12-04T08:59:16.7122229Z a86faaa7dbdd: Verifying Checksum
2025-12-04T08:59:16.7122886Z a86faaa7dbdd: Download complete
2025-12-04T08:59:16.8124696Z fb7848686804: Download complete
2025-12-04T08:59:16.8977642Z 3541df015cdb: Verifying Checksum
2025-12-04T08:59:16.8978071Z 3541df015cdb: Download complete
2025-12-04T08:59:16.9539326Z 79dc80f426b2: Download complete
2025-12-04T08:59:17.3488608Z 63e5bc7682b8: Pull complete
2025-12-04T08:59:17.3613598Z 0678d56345c9: Pull complete
2025-12-04T08:59:17.7931346Z 375c4427e914: Download complete
2025-12-04T08:59:17.8016962Z 4f4fb700ef54: Verifying Checksum
2025-12-04T08:59:17.8017396Z 4f4fb700ef54: Download complete
2025-12-04T08:59:17.8702434Z 549db4d6c618: Verifying Checksum
2025-12-04T08:59:17.8702836Z 549db4d6c618: Download complete
2025-12-04T08:59:17.9506311Z 5c63528cb580: Download complete
2025-12-04T08:59:18.0292544Z 75bd83b989a4: Download complete
2025-12-04T08:59:18.1051889Z de6e78970f51: Verifying Checksum
2025-12-04T08:59:18.1052363Z de6e78970f51: Download complete
2025-12-04T08:59:18.1772628Z e13ed7c7e473: Download complete
2025-12-04T08:59:18.2447385Z 6e2949bcb741: Verifying Checksum
2025-12-04T08:59:18.2448027Z 6e2949bcb741: Download complete
2025-12-04T08:59:18.3044726Z 14d69d9aaec7: Verifying Checksum
2025-12-04T08:59:18.3045222Z 14d69d9aaec7: Download complete
2025-12-04T08:59:18.3772037Z 5c02769dd8e5: Verifying Checksum
2025-12-04T08:59:18.3772451Z 5c02769dd8e5: Download complete
2025-12-04T08:59:19.4058909Z 45f5c9ddfce7: Verifying Checksum
2025-12-04T08:59:19.4059338Z 45f5c9ddfce7: Download complete
2025-12-04T08:59:19.4865590Z 2fa92dc5885e: Verifying Checksum
2025-12-04T08:59:19.4865992Z 2fa92dc5885e: Download complete
2025-12-04T08:59:19.8783672Z 2b85eafbd92a: Verifying Checksum
2025-12-04T08:59:19.8784217Z 2b85eafbd92a: Download complete
2025-12-04T08:59:19.9653918Z ff755a4ddad7: Verifying Checksum
2025-12-04T08:59:19.9654368Z ff755a4ddad7: Download complete
2025-12-04T08:59:20.0504264Z 09eb41bdf42d: Download complete
2025-12-04T08:59:24.7059571Z 11ede4d59e93: Verifying Checksum
2025-12-04T08:59:24.7060003Z 11ede4d59e93: Download complete
2025-12-04T08:59:24.7667258Z 1283cd8f801a: Download complete
2025-12-04T08:59:24.8426365Z 024fa855425f: Verifying Checksum
2025-12-04T08:59:24.8427046Z 024fa855425f: Download complete
2025-12-04T08:59:24.9067080Z 303e6747a62e: Verifying Checksum
2025-12-04T08:59:24.9067537Z 303e6747a62e: Download complete
2025-12-04T08:59:24.9922956Z 3017cdf4838b: Verifying Checksum
2025-12-04T08:59:24.9923350Z 3017cdf4838b: Download complete
2025-12-04T08:59:25.2376914Z 6b6cd1c358e8: Verifying Checksum
2025-12-04T08:59:25.2377373Z 6b6cd1c358e8: Download complete
2025-12-04T08:59:25.3012118Z b2dd04501124: Verifying Checksum
2025-12-04T08:59:25.3012593Z b2dd04501124: Download complete
2025-12-04T08:59:25.3938960Z 55adc51fe589: Verifying Checksum
2025-12-04T08:59:25.3939423Z 55adc51fe589: Download complete
2025-12-04T08:59:25.4922165Z a43ca0e4b837: Verifying Checksum
2025-12-04T08:59:25.4922584Z a43ca0e4b837: Download complete
2025-12-04T08:59:25.6031118Z b7212f17fd14: Verifying Checksum
2025-12-04T08:59:25.6031542Z b7212f17fd14: Download complete
2025-12-04T08:59:25.6727061Z 083e42cac090: Verifying Checksum
2025-12-04T08:59:25.6727453Z 083e42cac090: Download complete
2025-12-04T08:59:25.7600765Z 0a00b784a4aa: Download complete
2025-12-04T08:59:25.8454634Z c6173c779f7b: Verifying Checksum
2025-12-04T08:59:25.8455044Z c6173c779f7b: Download complete
2025-12-04T08:59:26.4646880Z 45f5c9ddfce7: Pull complete
2025-12-04T08:59:26.4866474Z 086b1df51ac1: Pull complete
2025-12-04T08:59:26.5078507Z fe8a7b64bf98: Pull complete
2025-12-04T08:59:26.5330514Z 7680723e9a57: Pull complete
2025-12-04T08:59:26.5701933Z 9c5027aeeb4e: Pull complete
2025-12-04T08:59:26.5957046Z 9a5652110360: Pull complete
2025-12-04T08:59:27.3496441Z ed3d1e3387b9: Verifying Checksum
2025-12-04T08:59:27.3497311Z ed3d1e3387b9: Download complete
2025-12-04T08:59:27.4208432Z b29343478586: Verifying Checksum
2025-12-04T08:59:27.4209080Z b29343478586: Download complete
2025-12-04T08:59:29.0262934Z 375c4427e914: Pull complete
2025-12-04T08:59:29.3886945Z a86faaa7dbdd: Pull complete
2025-12-04T08:59:29.8285588Z fb7848686804: Pull complete
2025-12-04T08:59:30.3064029Z 3541df015cdb: Pull complete
2025-12-04T08:59:30.5964481Z c6f0520487fb: Verifying Checksum
2025-12-04T08:59:30.5964894Z c6f0520487fb: Download complete
2025-12-04T08:59:30.7830707Z 79dc80f426b2: Pull complete
2025-12-04T08:59:48.8463210Z a13fcc1b90bb: Verifying Checksum
2025-12-04T08:59:48.8463662Z a13fcc1b90bb: Download complete
2025-12-04T08:59:48.9452747Z 2c666d30ed77: Verifying Checksum
2025-12-04T08:59:48.9453679Z 2c666d30ed77: Download complete
2025-12-04T08:59:49.0229660Z 5d8d3a0a98e0: Verifying Checksum
2025-12-04T08:59:49.0230106Z 5d8d3a0a98e0: Download complete
2025-12-04T08:59:49.1138399Z b06bafce9e81: Download complete
2025-12-04T08:59:49.1899603Z 15e0d7e4590d: Verifying Checksum
2025-12-04T08:59:49.1900021Z 15e0d7e4590d: Download complete
2025-12-04T08:59:49.2511547Z a514bd1add31: Download complete
2025-12-04T08:59:49.3353521Z 57b84ee60002: Download complete
2025-12-04T08:59:49.4183582Z b8babeff6d81: Verifying Checksum
2025-12-04T08:59:49.4184020Z b8babeff6d81: Download complete
2025-12-04T08:59:49.4917158Z 83779ddf6a85: Download complete
2025-12-04T08:59:49.5880223Z 8b7620c0d736: Verifying Checksum
2025-12-04T08:59:49.5880808Z 8b7620c0d736: Download complete
2025-12-04T08:59:49.6724927Z 3bcfa090e4ef: Verifying Checksum
2025-12-04T08:59:49.6725372Z 3bcfa090e4ef: Download complete
2025-12-04T08:59:49.7589238Z eb0504ec4d92: Verifying Checksum
2025-12-04T08:59:49.7589657Z eb0504ec4d92: Download complete
2025-12-04T08:59:49.8295572Z 15d0fec09d7b: Verifying Checksum
2025-12-04T08:59:49.8295995Z 15d0fec09d7b: Download complete
2025-12-04T08:59:49.8955698Z cca81fcc62a9: Verifying Checksum
2025-12-04T08:59:49.8956144Z cca81fcc62a9: Download complete
2025-12-04T08:59:49.9851067Z b0b8f9b5c6ab: Verifying Checksum
2025-12-04T08:59:49.9851474Z b0b8f9b5c6ab: Download complete
2025-12-04T08:59:50.0694573Z 0606ca4d47a8: Verifying Checksum
2025-12-04T08:59:50.0695031Z 0606ca4d47a8: Download complete
2025-12-04T08:59:50.1420617Z 2f80a4e1b3b9: Verifying Checksum
2025-12-04T08:59:50.1421197Z 2f80a4e1b3b9: Download complete
2025-12-04T08:59:50.2341657Z 35c916fb1bd0: Download complete
2025-12-04T08:59:52.2494136Z 195537b7dafc: Verifying Checksum
2025-12-04T08:59:52.2494977Z 195537b7dafc: Download complete
2025-12-04T08:59:52.3365558Z dc454fd3967e: Verifying Checksum
2025-12-04T08:59:52.3366046Z dc454fd3967e: Download complete
2025-12-04T08:59:52.4102503Z 701b34f115fa: Verifying Checksum
2025-12-04T08:59:52.4102919Z 701b34f115fa: Download complete
2025-12-04T08:59:52.4943408Z 39cefc00ffed: Download complete
2025-12-04T08:59:52.5624520Z 6ae51eb61a32: Verifying Checksum
2025-12-04T08:59:52.5625070Z 6ae51eb61a32: Download complete
2025-12-04T08:59:52.6210759Z 1fd5341e66df: Verifying Checksum
2025-12-04T08:59:52.6211342Z 1fd5341e66df: Download complete
2025-12-04T08:59:52.8405146Z 72a7c87e35e4: Verifying Checksum
2025-12-04T08:59:52.8405568Z 72a7c87e35e4: Download complete
2025-12-04T08:59:52.9443187Z ec36862ac98e: Verifying Checksum
2025-12-04T08:59:52.9443673Z ec36862ac98e: Download complete
2025-12-04T08:59:53.5355418Z 05ddbf246e8a: Verifying Checksum
2025-12-04T08:59:53.5356111Z 05ddbf246e8a: Download complete
2025-12-04T09:00:01.2258856Z 148171691cd4: Verifying Checksum
2025-12-04T09:00:01.2259278Z 148171691cd4: Download complete
2025-12-04T09:00:34.1473581Z a13fcc1b90bb: Pull complete
2025-12-04T09:00:34.3923260Z 4f4fb700ef54: Pull complete
2025-12-04T09:00:34.8376112Z 549db4d6c618: Pull complete
2025-12-04T09:00:35.5248241Z 5c63528cb580: Pull complete
2025-12-04T09:00:36.0107039Z 75bd83b989a4: Pull complete
2025-12-04T09:00:36.4167606Z de6e78970f51: Pull complete
2025-12-04T09:00:36.5665881Z e13ed7c7e473: Pull complete
2025-12-04T09:00:36.6906633Z 6e2949bcb741: Pull complete
2025-12-04T09:00:37.0520490Z 14d69d9aaec7: Pull complete
2025-12-04T09:00:37.3267714Z 35041ce524ac: Verifying Checksum
2025-12-04T09:00:37.3268623Z 35041ce524ac: Download complete
2025-12-04T09:00:37.5019481Z 5c02769dd8e5: Pull complete
2025-12-04T09:01:49.8274613Z 35041ce524ac: Pull complete
2025-12-04T09:01:50.2368669Z 2fa92dc5885e: Pull complete
2025-12-04T09:01:51.2370662Z 2b85eafbd92a: Pull complete
2025-12-04T09:01:51.6898264Z ff755a4ddad7: Pull complete
2025-12-04T09:01:52.1964615Z 09eb41bdf42d: Pull complete
2025-12-04T09:02:00.1003649Z 11ede4d59e93: Pull complete
2025-12-04T09:02:00.5211534Z 1283cd8f801a: Pull complete
2025-12-04T09:02:00.8810693Z 024fa855425f: Pull complete
2025-12-04T09:02:01.3505304Z 303e6747a62e: Pull complete
2025-12-04T09:02:01.5887823Z 3017cdf4838b: Pull complete
2025-12-04T09:02:01.9022683Z 6b6cd1c358e8: Pull complete
2025-12-04T09:02:01.9250236Z b2dd04501124: Pull complete
2025-12-04T09:02:01.9487715Z 55adc51fe589: Pull complete
2025-12-04T09:02:01.9942336Z a43ca0e4b837: Pull complete
2025-12-04T09:02:02.0183851Z b7212f17fd14: Pull complete
2025-12-04T09:02:02.0420552Z 083e42cac090: Pull complete
2025-12-04T09:02:02.0874458Z 0a00b784a4aa: Pull complete
2025-12-04T09:02:02.1125578Z c6173c779f7b: Pull complete
2025-12-04T09:02:05.0297442Z ed3d1e3387b9: Pull complete
2025-12-04T09:02:05.0546716Z b29343478586: Pull complete
2025-12-04T09:02:06.4151579Z c6f0520487fb: Pull complete
2025-12-04T09:02:57.7145882Z 148171691cd4: Pull complete
2025-12-04T09:02:58.1499512Z 2c666d30ed77: Pull complete
2025-12-04T09:02:58.6080512Z 5d8d3a0a98e0: Pull complete
2025-12-04T09:02:59.5830139Z b06bafce9e81: Pull complete
2025-12-04T09:03:00.3438614Z 15e0d7e4590d: Pull complete
2025-12-04T09:03:00.7960552Z a514bd1add31: Pull complete
2025-12-04T09:03:01.8162153Z 57b84ee60002: Pull complete
2025-12-04T09:03:02.3426793Z b8babeff6d81: Pull complete
2025-12-04T09:03:02.6121350Z 83779ddf6a85: Pull complete
2025-12-04T09:03:03.2005802Z 8b7620c0d736: Pull complete
2025-12-04T09:03:03.9295717Z 3bcfa090e4ef: Pull complete
2025-12-04T09:03:04.3193691Z eb0504ec4d92: Pull complete
2025-12-04T09:03:05.0452515Z 15d0fec09d7b: Pull complete
2025-12-04T09:03:05.4951991Z cca81fcc62a9: Pull complete
2025-12-04T09:03:06.2995168Z b0b8f9b5c6ab: Pull complete
2025-12-04T09:03:06.6299287Z 0606ca4d47a8: Pull complete
2025-12-04T09:03:07.5572240Z 2f80a4e1b3b9: Pull complete
2025-12-04T09:03:08.0015692Z 35c916fb1bd0: Pull complete
2025-12-04T09:03:14.0175108Z 195537b7dafc: Pull complete
2025-12-04T09:03:14.4663851Z dc454fd3967e: Pull complete
2025-12-04T09:03:14.9101341Z 701b34f115fa: Pull complete
2025-12-04T09:03:15.3272328Z 39cefc00ffed: Pull complete
2025-12-04T09:03:15.7303497Z 6ae51eb61a32: Pull complete
2025-12-04T09:03:16.0952087Z 1fd5341e66df: Pull complete
2025-12-04T09:03:17.6688435Z 72a7c87e35e4: Pull complete
2025-12-04T09:03:18.0652405Z ec36862ac98e: Pull complete
2025-12-04T09:03:19.7814392Z 05ddbf246e8a: Pull complete
2025-12-04T09:03:20.4566128Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T09:03:20.5138800Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:03:20.5405571Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:03:20.5479228Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:03:20.5480334Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:20.5489453Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:20.5490013Z env:
2025-12-04T09:03:20.5490410Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:20.5490693Z ##[endgroup]
2025-12-04T09:03:20.5680280Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2025-12-04T09:03:20.5680751Z with:
2025-12-04T09:03:20.5680994Z   driver-version: 580.82.07
2025-12-04T09:03:20.5681270Z env:
2025-12-04T09:03:20.5681504Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:20.5681793Z ##[endgroup]
2025-12-04T09:03:20.5834705Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:03:20.5835748Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:20.5842183Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:20.5842608Z env:
2025-12-04T09:03:20.5842852Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:20.5843135Z ##[endgroup]
2025-12-04T09:03:20.5977960Z ##[group]Run set -euo pipefail
2025-12-04T09:03:20.5978351Z [36;1mset -euo pipefail[0m
2025-12-04T09:03:20.5978697Z [36;1m[0m
2025-12-04T09:03:20.5978935Z [36;1mhas_gpu=false[0m
2025-12-04T09:03:20.5979230Z [36;1mdevices=""[0m
2025-12-04T09:03:20.5979502Z [36;1m[0m
2025-12-04T09:03:20.5979813Z [36;1mif command -v nvidia-smi >/dev/null 2>&1; then[0m
2025-12-04T09:03:20.5980346Z [36;1m  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:20.5980807Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:20.5981158Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:20.5981524Z [36;1m  fi[0m
2025-12-04T09:03:20.5981774Z [36;1mfi[0m
2025-12-04T09:03:20.5982029Z [36;1m[0m
2025-12-04T09:03:20.5982296Z [36;1mif [ "$has_gpu" = false ]; then[0m
2025-12-04T09:03:20.5982757Z [36;1m  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:20.5983199Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:20.5983546Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:20.5983927Z [36;1m  fi[0m
2025-12-04T09:03:20.5984161Z [36;1mfi[0m
2025-12-04T09:03:20.5984409Z [36;1m[0m
2025-12-04T09:03:20.5984772Z [36;1mif [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then[0m
2025-12-04T09:03:20.5985366Z [36;1m  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:20.5985859Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:20.5986205Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:20.5986576Z [36;1m  fi[0m
2025-12-04T09:03:20.5986812Z [36;1mfi[0m
2025-12-04T09:03:20.5987059Z [36;1m[0m
2025-12-04T09:03:20.5987412Z [36;1mprintf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:20.5988037Z [36;1mprintf 'DETECTED_DEVICES<<EOF\n%s\nEOF\n' "$devices" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:20.5993984Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:20.5994402Z env:
2025-12-04T09:03:20.5994625Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:20.5994909Z ##[endgroup]
2025-12-04T09:03:23.7217568Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then
2025-12-04T09:03:23.7218052Z [36;1mif [ "${HAS_NVIDIA}" = "true" ]; then[0m
2025-12-04T09:03:23.7218499Z [36;1m  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:23.7219133Z [36;1m  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:23.7219681Z [36;1melse[0m
2025-12-04T09:03:23.7219997Z [36;1m  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:23.7220402Z [36;1mfi[0m
2025-12-04T09:03:23.7227510Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:23.7227952Z env:
2025-12-04T09:03:23.7228201Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:23.7228496Z   HAS_NVIDIA: true
2025-12-04T09:03:23.7228763Z ##[endgroup]
2025-12-04T09:03:23.7317796Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2025-12-04T09:03:23.7318231Z with:
2025-12-04T09:03:23.7318440Z   timeout_minutes: 10
2025-12-04T09:03:23.7318870Z   max_attempts: 3
2025-12-04T09:03:23.7351536Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then
          YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo"
        else
          # Amazon Linux 2
          YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
        fi

        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y \
          nvidia-container-toolkit-1.17.8 \
          libnvidia-container-tools-1.17.8 \
          libnvidia-container1-1.17.8 \
          nvidia-container-toolkit-base-1.17.8
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        # Install nvidia-driver package if not installed
        status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
        if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
          sudo apt-get install -y nvidia-container-toolkit-1.17.8
          sudo systemctl restart docker
        fi
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"

                # Turn off persistent mode so that the installation script can unload the kernel module
                sudo killall nvidia-persistenced || true
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e

            nvidia-smi
            # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in
            # the case where the driver has already crashed as it still can get the driver version
            # and some basic information like the bus ID.  However, the rest of the information
            # would be missing (ERR!), for example:
            #
            # +-----------------------------------------------------------------------------+
            # | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
            # |-------------------------------+----------------------+----------------------+
            # | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
            # | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
            # |                               |                      |               MIG M. |
            # |===============================+======================+======================|
            # |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |
            # |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |
            # |                               |                      |                 ERR! |
            # +-------------------------------+----------------------+----------------------+
            #
            # +-----------------------------------------------------------------------------+
            # | Processes:                                                                  |
            # |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
            # |        ID   ID                                                   Usage      |
            # |=============================================================================|
            # +-----------------------------------------------------------------------------+
            #
            # This should be reported as a failure instead as it will guarantee to fail when
            # Docker tries to run with --gpus all
            #
            # So, the correct check here is to query one of the missing piece of info like
            # GPU name, so that the command can fail accordingly
            nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with
# more than one GPUs. This just needs to be run once. The command fails
# on subsequent runs and complains that the mode is already on, but that's
# ok
sudo nvidia-persistenced || true
# This should show persistence mode ON
nvidia-smi

# check if the container-toolkit is correctly installed and CUDA is available inside a container
docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi

2025-12-04T09:03:23.7382951Z   retry_wait_seconds: 10
2025-12-04T09:03:23.7383276Z   polling_interval_seconds: 1
2025-12-04T09:03:23.7383596Z   warning_on_retry: true
2025-12-04T09:03:23.7383905Z   continue_on_error: false
2025-12-04T09:03:23.7384200Z env:
2025-12-04T09:03:23.7384428Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:23.7384731Z   HAS_NVIDIA_GPU: true
2025-12-04T09:03:23.7385090Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:03:23.7385505Z   DRIVER_VERSION: 580.82.07
2025-12-04T09:03:23.7385806Z ##[endgroup]
2025-12-04T09:03:23.8741382Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run ==
2025-12-04T09:03:23.8742447Z + pre_install_nvidia_driver_amzn2
2025-12-04T09:03:23.8743614Z + sudo yum remove -y nvidia-driver-latest-dkms
2025-12-04T09:03:24.4956840Z No match for argument: nvidia-driver-latest-dkms
2025-12-04T09:03:24.4957334Z No packages marked for removal.
2025-12-04T09:03:24.5022882Z Dependencies resolved.
2025-12-04T09:03:24.5032794Z Nothing to do.
2025-12-04T09:03:24.5033587Z Complete!
2025-12-04T09:03:24.6212828Z + install_nvidia_driver_common
2025-12-04T09:03:24.6213939Z + echo 'Before installing NVIDIA driver'
2025-12-04T09:03:24.6215172Z + lspci
2025-12-04T09:03:24.6217225Z Before installing NVIDIA driver
2025-12-04T09:03:24.7722598Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:03:24.7723256Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:03:24.7723966Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:03:24.7724632Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:03:24.7725227Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:03:24.7725899Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:03:24.7726526Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:24.7727111Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:24.7727675Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:24.7728645Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:24.7729282Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:03:24.7729777Z + lsmod
2025-12-04T09:03:24.7760006Z Module                  Size  Used by
2025-12-04T09:03:24.7760474Z nvidia_uvm           1925120  0
2025-12-04T09:03:24.7760842Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:03:24.7761179Z drm                   602112  1 nvidia
2025-12-04T09:03:24.7761547Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:03:24.7761922Z backlight              24576  1 drm
2025-12-04T09:03:24.7762250Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:03:24.7762599Z xt_conntrack           16384  1
2025-12-04T09:03:24.7762911Z nft_chain_nat          16384  3
2025-12-04T09:03:24.7763206Z xt_MASQUERADE          20480  1
2025-12-04T09:03:24.7763561Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:03:24.7763983Z nf_conntrack_netlink    57344  0
2025-12-04T09:03:24.7764492Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:03:24.7765016Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:03:24.7765400Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:03:24.7765758Z xfrm_user              57344  1
2025-12-04T09:03:24.7766062Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:03:24.7766409Z xt_addrtype            16384  2
2025-12-04T09:03:24.7766718Z nft_compat             20480  4
2025-12-04T09:03:24.7767081Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:03:24.7767571Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:03:24.7768027Z br_netfilter           36864  0
2025-12-04T09:03:24.7768358Z bridge                323584  1 br_netfilter
2025-12-04T09:03:24.7768699Z stp                    16384  1 bridge
2025-12-04T09:03:24.7769045Z llc                    16384  2 bridge,stp
2025-12-04T09:03:24.7769390Z overlay               167936  0
2025-12-04T09:03:24.7769681Z tls                   139264  0
2025-12-04T09:03:24.7769981Z nls_ascii              16384  1
2025-12-04T09:03:24.7770284Z nls_cp437              20480  1
2025-12-04T09:03:24.7770570Z vfat                   24576  1
2025-12-04T09:03:24.7770868Z fat                    86016  1 vfat
2025-12-04T09:03:24.7771187Z sunrpc                700416  1
2025-12-04T09:03:24.7771483Z i8042                  45056  0
2025-12-04T09:03:24.7771768Z skx_edac_common        28672  0
2025-12-04T09:03:24.7772066Z ena                   184320  0
2025-12-04T09:03:24.7772366Z serio                  28672  3 i8042
2025-12-04T09:03:24.7772685Z ghash_clmulni_intel    16384  0
2025-12-04T09:03:24.7772993Z button                 24576  0
2025-12-04T09:03:24.7773295Z sch_fq_codel           20480  33
2025-12-04T09:03:24.7773588Z dm_mod                188416  0
2025-12-04T09:03:24.7773881Z fuse                  184320  1
2025-12-04T09:03:24.7774179Z loop                   36864  0
2025-12-04T09:03:24.7774469Z configfs               57344  1
2025-12-04T09:03:24.7774770Z dmi_sysfs              20480  0
2025-12-04T09:03:24.7775070Z crc32_pclmul           16384  0
2025-12-04T09:03:24.7775359Z crc32c_intel           24576  0
2025-12-04T09:03:24.7775660Z efivarfs               24576  1
2025-12-04T09:03:24.7775955Z + modinfo nvidia
2025-12-04T09:03:24.7776623Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:03:24.7777388Z import_ns:      DMA_BUF
2025-12-04T09:03:24.7777681Z alias:          char-major-195-*
2025-12-04T09:03:24.7778013Z version:        580.82.07
2025-12-04T09:03:24.7778318Z supported:      external
2025-12-04T09:03:24.7778614Z license:        Dual MIT/GPL
2025-12-04T09:03:24.7778966Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:03:24.7779386Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:03:24.7779774Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:03:24.7781751Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:03:24.7782193Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:03:24.7782725Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:03:24.7783146Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:03:24.7783562Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:03:24.7783973Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:03:24.7784366Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:03:24.7784742Z depends:        i2c-core,drm
2025-12-04T09:03:24.7785046Z retpoline:      Y
2025-12-04T09:03:24.7785291Z name:           nvidia
2025-12-04T09:03:24.7785726Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:03:24.7786299Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:03:24.7786830Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:03:24.7787344Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:03:24.7787716Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:03:24.7788078Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:03:24.7788445Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:03:24.7788807Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:03:24.7789175Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:03:24.7789591Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:03:24.7790054Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:03:24.7790458Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:03:24.7790806Z parm:           NVreg_EnableMSI:int
2025-12-04T09:03:24.7791171Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:03:24.7791601Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:03:24.7792059Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:03:24.7792718Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:03:24.7793175Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:24.7793803Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:03:24.7794276Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:24.7794916Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:03:24.7795319Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:03:24.7795747Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:03:24.7796192Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:03:24.7796605Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:03:24.7796992Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:03:24.7797375Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:03:24.7797763Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:03:24.7798134Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:03:24.7798535Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:03:24.7798973Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:03:24.7799493Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:03:24.7799898Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:03:24.7800394Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:03:24.7800769Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:03:24.7801149Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:03:24.7801524Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:03:24.7801899Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:03:24.7802264Z parm:           NVreg_RmMsg:charp
2025-12-04T09:03:24.7802568Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:03:24.7802926Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:03:24.7803285Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:03:24.7803620Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:03:24.7804058Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:03:24.7804517Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:03:24.7804889Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:03:24.7805244Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:03:24.7805623Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:03:24.7805997Z parm:           rm_firmware_active:charp
2025-12-04T09:03:24.7806303Z + HAS_NVIDIA_DRIVER=0
2025-12-04T09:03:24.7806574Z ++ command -v nvidia-smi
2025-12-04T09:03:24.7806855Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:03:24.7807123Z + set +e
2025-12-04T09:03:24.7807460Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:03:27.8739408Z + INSTALLED_DRIVER_VERSION=580.82.07
2025-12-04T09:03:27.8739842Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:03:27.8740384Z + '[' 0 -ne 0 ']'
2025-12-04T09:03:27.8740651Z + '[' 580.82.07 '!=' 580.82.07 ']'
2025-12-04T09:03:27.8740972Z + HAS_NVIDIA_DRIVER=1
2025-12-04T09:03:27.8741562Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation'
2025-12-04T09:03:27.8742154Z + set -e
2025-12-04T09:03:27.8742390Z + '[' 1 -eq 0 ']'
2025-12-04T09:03:27.8742849Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation
2025-12-04T09:03:27.8743438Z + post_install_nvidia_driver_common
2025-12-04T09:03:27.8744966Z + sudo modprobe nvidia
2025-12-04T09:03:28.0397295Z + echo 'After installing NVIDIA driver'
2025-12-04T09:03:28.0397799Z + lspci
2025-12-04T09:03:28.0398653Z After installing NVIDIA driver
2025-12-04T09:03:28.0520199Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:03:28.0521115Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:03:28.0522073Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:03:28.0522754Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:03:28.0523401Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:03:28.0524078Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:03:28.0524702Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:28.0525293Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:28.0525870Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:28.0526453Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:03:28.0527066Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:03:28.0527570Z + lsmod
2025-12-04T09:03:28.0544990Z Module                  Size  Used by
2025-12-04T09:03:28.0545364Z nvidia_uvm           1925120  0
2025-12-04T09:03:28.0545703Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:03:28.0546065Z drm                   602112  1 nvidia
2025-12-04T09:03:28.0546456Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:03:28.0546834Z backlight              24576  1 drm
2025-12-04T09:03:28.0547189Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:03:28.0547552Z xt_conntrack           16384  1
2025-12-04T09:03:28.0547878Z nft_chain_nat          16384  3
2025-12-04T09:03:28.0548199Z xt_MASQUERADE          20480  1
2025-12-04T09:03:28.0548681Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:03:28.0549084Z nf_conntrack_netlink    57344  0
2025-12-04T09:03:28.0549550Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:03:28.0550081Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:03:28.0550457Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:03:28.0550796Z xfrm_user              57344  1
2025-12-04T09:03:28.0551114Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:03:28.0551456Z xt_addrtype            16384  2
2025-12-04T09:03:28.0551751Z nft_compat             20480  4
2025-12-04T09:03:28.0553162Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:03:28.0553677Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:03:28.0554119Z br_netfilter           36864  0
2025-12-04T09:03:28.0554453Z bridge                323584  1 br_netfilter
2025-12-04T09:03:28.0554815Z stp                    16384  1 bridge
2025-12-04T09:03:28.0555153Z llc                    16384  2 bridge,stp
2025-12-04T09:03:28.0555483Z overlay               167936  0
2025-12-04T09:03:28.0555783Z tls                   139264  0
2025-12-04T09:03:28.0556082Z nls_ascii              16384  1
2025-12-04T09:03:28.0556411Z nls_cp437              20480  1
2025-12-04T09:03:28.0556706Z vfat                   24576  1
2025-12-04T09:03:28.0557005Z fat                    86016  1 vfat
2025-12-04T09:03:28.0557314Z sunrpc                700416  1
2025-12-04T09:03:28.0557608Z i8042                  45056  0
2025-12-04T09:03:28.0557903Z skx_edac_common        28672  0
2025-12-04T09:03:28.0558194Z ena                   184320  0
2025-12-04T09:03:28.0558501Z serio                  28672  3 i8042
2025-12-04T09:03:28.0558837Z ghash_clmulni_intel    16384  0
2025-12-04T09:03:28.0559131Z button                 24576  0
2025-12-04T09:03:28.0559449Z sch_fq_codel           20480  33
2025-12-04T09:03:28.0559757Z dm_mod                188416  0
2025-12-04T09:03:28.0560052Z fuse                  184320  1
2025-12-04T09:03:28.0560332Z loop                   36864  0
2025-12-04T09:03:28.0560630Z configfs               57344  1
2025-12-04T09:03:28.0560932Z dmi_sysfs              20480  0
2025-12-04T09:03:28.0561219Z crc32_pclmul           16384  0
2025-12-04T09:03:28.0561519Z crc32c_intel           24576  0
2025-12-04T09:03:28.0561822Z efivarfs               24576  1
2025-12-04T09:03:28.0562108Z + modinfo nvidia
2025-12-04T09:03:28.0562614Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:03:28.0563163Z import_ns:      DMA_BUF
2025-12-04T09:03:28.0563460Z alias:          char-major-195-*
2025-12-04T09:03:28.0563773Z version:        580.82.07
2025-12-04T09:03:28.0564068Z supported:      external
2025-12-04T09:03:28.0564364Z license:        Dual MIT/GPL
2025-12-04T09:03:28.0564692Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:03:28.0565095Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:03:28.0565482Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:03:28.0565878Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:03:28.0566286Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:03:28.0566701Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:03:28.0567117Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:03:28.0567510Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:03:28.0567912Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:03:28.0568312Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:03:28.0568676Z depends:        i2c-core,drm
2025-12-04T09:03:28.0568983Z retpoline:      Y
2025-12-04T09:03:28.0569238Z name:           nvidia
2025-12-04T09:03:28.0569658Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:03:28.0570231Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:03:28.0570772Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:03:28.0571279Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:03:28.0571640Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:03:28.0572000Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:03:28.0572380Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:03:28.0572730Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:03:28.0573093Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:03:28.0573525Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:03:28.0573974Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:03:28.0574475Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:03:28.0574908Z parm:           NVreg_EnableMSI:int
2025-12-04T09:03:28.0575275Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:03:28.0575693Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:03:28.0576161Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:03:28.0576893Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:03:28.0577392Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:28.0577948Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:03:28.0578466Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:28.0578965Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:03:28.0579389Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:03:28.0579850Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:03:28.0580315Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:03:28.0580738Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:03:28.0581142Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:03:28.0581554Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:03:28.0581945Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:03:28.0582335Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:03:28.0582767Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:03:28.0583205Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:03:28.0583645Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:03:28.0584089Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:03:28.0584506Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:03:28.0584920Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:03:28.0585363Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:03:28.0585787Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:03:28.0586196Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:03:28.0586612Z parm:           NVreg_RmMsg:charp
2025-12-04T09:03:28.0586968Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:03:28.0587352Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:03:28.0587750Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:03:28.0588281Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:03:28.0588673Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:03:28.0589083Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:03:28.0589501Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:03:28.0589891Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:03:28.0590295Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:03:28.0590702Z parm:           rm_firmware_active:charp
2025-12-04T09:03:28.0591048Z + set +e
2025-12-04T09:03:28.0591261Z + nvidia-smi
2025-12-04T09:03:29.8723157Z Thu Dec  4 09:03:29 2025       
2025-12-04T09:03:29.8723692Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:29.8724338Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:03:29.8724948Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:29.8725578Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:03:29.8726248Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:03:29.8726797Z |                                         |                        |               MIG M. |
2025-12-04T09:03:29.8727258Z |=========================================+========================+======================|
2025-12-04T09:03:29.9108047Z |   0  Tesla T4                       Off |   00000000:00:1B.0 Off |                    0 |
2025-12-04T09:03:29.9108723Z | N/A   30C    P0             24W /   70W |       0MiB /  15360MiB |      3%      Default |
2025-12-04T09:03:29.9109712Z |                                         |                        |                  N/A |
2025-12-04T09:03:29.9110212Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:29.9110753Z |   1  Tesla T4                       Off |   00000000:00:1C.0 Off |                    0 |
2025-12-04T09:03:29.9111279Z | N/A   29C    P0             25W /   70W |       0MiB /  15360MiB |      5%      Default |
2025-12-04T09:03:29.9111731Z |                                         |                        |                  N/A |
2025-12-04T09:03:29.9112215Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:29.9112754Z |   2  Tesla T4                       Off |   00000000:00:1D.0 Off |                    0 |
2025-12-04T09:03:29.9113266Z | N/A   28C    P0             25W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:03:29.9113727Z |                                         |                        |                  N/A |
2025-12-04T09:03:29.9114210Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:29.9114741Z |   3  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:03:29.9115251Z | N/A   29C    P0             25W /   70W |       0MiB /  15360MiB |      5%      Default |
2025-12-04T09:03:29.9115698Z |                                         |                        |                  N/A |
2025-12-04T09:03:29.9116177Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:29.9116524Z 
2025-12-04T09:03:29.9116742Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:29.9117269Z | Processes:                                                                              |
2025-12-04T09:03:29.9117805Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:03:29.9118314Z |        ID   ID                                                               Usage      |
2025-12-04T09:03:29.9118731Z |=========================================================================================|
2025-12-04T09:03:29.9131856Z |  No running processes found                                                             |
2025-12-04T09:03:29.9132467Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:31.6090965Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T09:03:33.3999654Z Tesla T4
2025-12-04T09:03:34.7087719Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:03:34.7088112Z + '[' 0 -eq 0 ']'
2025-12-04T09:03:34.7088404Z + echo 'INFO: Ignoring allowed status 0'
2025-12-04T09:03:34.7088771Z + set -e
2025-12-04T09:03:34.7089030Z INFO: Ignoring allowed status 0
2025-12-04T09:03:34.7094313Z == Installing nvidia container toolkit for amzn2023 ==
2025-12-04T09:03:34.7098383Z + sudo yum install -y yum-utils
2025-12-04T09:03:35.1764045Z Last metadata expiration check: 0:07:31 ago on Thu Dec  4 08:56:04 2025.
2025-12-04T09:03:35.2072605Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed.
2025-12-04T09:03:35.2673598Z Dependencies resolved.
2025-12-04T09:03:35.2974517Z Nothing to do.
2025-12-04T09:03:35.2975057Z Complete!
2025-12-04T09:03:35.6842778Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]]
2025-12-04T09:03:35.6843531Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:35.6844611Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:36.0725931Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:36.1213798Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8
2025-12-04T09:03:36.7022893Z nvidia-container-toolkit                         21 kB/s | 833  B     00:00    
2025-12-04T09:03:36.7971515Z Dependencies resolved.
2025-12-04T09:03:36.8275062Z ================================================================================
2025-12-04T09:03:36.8275644Z  Package                       Arch   Version    Repository                Size
2025-12-04T09:03:36.8276121Z ================================================================================
2025-12-04T09:03:36.8276501Z Downgrading:
2025-12-04T09:03:36.8276942Z  libnvidia-container-tools     x86_64 1.17.8-1   nvidia-container-toolkit  40 k
2025-12-04T09:03:36.8277641Z  libnvidia-container1          x86_64 1.17.8-1   nvidia-container-toolkit 1.0 M
2025-12-04T09:03:36.8278329Z  nvidia-container-toolkit      x86_64 1.17.8-1   nvidia-container-toolkit 1.2 M
2025-12-04T09:03:36.8279056Z  nvidia-container-toolkit-base x86_64 1.17.8-1   nvidia-container-toolkit 5.8 M
2025-12-04T09:03:36.8279521Z 
2025-12-04T09:03:36.8279637Z Transaction Summary
2025-12-04T09:03:36.8279938Z ================================================================================
2025-12-04T09:03:36.8280318Z Downgrade  4 Packages
2025-12-04T09:03:36.8280494Z 
2025-12-04T09:03:36.8280616Z Total download size: 8.0 M
2025-12-04T09:03:36.8280928Z Downloading Packages:
2025-12-04T09:03:36.9135691Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 480 kB/s |  40 kB     00:00    
2025-12-04T09:03:36.9785691Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm 6.5 MB/s | 1.0 MB     00:00    
2025-12-04T09:03:37.0146669Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 6.7 MB/s | 1.2 MB     00:00    
2025-12-04T09:03:37.1319197Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x  26 MB/s | 5.8 MB     00:00    
2025-12-04T09:03:37.1326496Z --------------------------------------------------------------------------------
2025-12-04T09:03:37.1330209Z Total                                            26 MB/s | 8.0 MB     00:00     
2025-12-04T09:03:37.1332904Z Running transaction check
2025-12-04T09:03:37.1481244Z Transaction check succeeded.
2025-12-04T09:03:37.1481633Z Running transaction test
2025-12-04T09:03:37.2006550Z Transaction test succeeded.
2025-12-04T09:03:37.2007560Z Running transaction
2025-12-04T09:03:38.0868905Z   Preparing        :                                                        1/1 
2025-12-04T09:03:38.2086147Z   Downgrading      : nvidia-container-toolkit-base-1.17.8-1.x86_64          1/8 
2025-12-04T09:03:38.2207679Z   Downgrading      : libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:03:38.2653438Z   Running scriptlet: libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:03:38.3938045Z   Downgrading      : libnvidia-container-tools-1.17.8-1.x86_64              3/8 
2025-12-04T09:03:38.4628340Z   Downgrading      : nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:03:38.5165810Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:03:38.5213884Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:38.5214616Z   Cleanup          : nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:38.5538203Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:38.5582800Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:38.5583523Z   Cleanup          : libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:38.5964346Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:38.6013994Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:38.6014707Z   Cleanup          : libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:38.6568407Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:38.6613200Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:38.6613929Z   Cleanup          : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:38.7178384Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:38.7715857Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               8/8 
2025-12-04T09:04:48.5083962Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:04:48.5084739Z   Verifying        : libnvidia-container-tools-1.17.8-1.x86_64              1/8 
2025-12-04T09:04:48.5085402Z   Verifying        : libnvidia-container-tools-1.18.1-1.x86_64              2/8 
2025-12-04T09:04:48.5086055Z   Verifying        : libnvidia-container1-1.17.8-1.x86_64                   3/8 
2025-12-04T09:04:48.5086682Z   Verifying        : libnvidia-container1-1.18.1-1.x86_64                   4/8 
2025-12-04T09:04:48.5087356Z   Verifying        : nvidia-container-toolkit-1.17.8-1.x86_64               5/8 
2025-12-04T09:04:48.5088009Z   Verifying        : nvidia-container-toolkit-1.18.1-1.x86_64               6/8 
2025-12-04T09:04:48.5088659Z   Verifying        : nvidia-container-toolkit-base-1.17.8-1.x86_64          7/8 
2025-12-04T09:04:48.6697321Z   Verifying        : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8================================================================================
2025-12-04T09:04:48.6698082Z WARNING:
2025-12-04T09:04:48.6698392Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:04:48.6699294Z 
2025-12-04T09:04:48.6699426Z   Available Versions:
2025-12-04T09:04:48.6699627Z 
2025-12-04T09:04:48.6699748Z   Version 2023.9.20250929:
2025-12-04T09:04:48.6700123Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:04:48.6700457Z 
2025-12-04T09:04:48.6700601Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:04:48.6700869Z 
2025-12-04T09:04:48.6701012Z     Release notes:
2025-12-04T09:04:48.6701531Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:04:48.6702024Z 
2025-12-04T09:04:48.6702132Z   Version 2023.9.20251014:
2025-12-04T09:04:48.6702519Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:04:48.6702837Z 
2025-12-04T09:04:48.6702987Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:04:48.6703249Z 
2025-12-04T09:04:48.6703352Z     Release notes:
2025-12-04T09:04:48.6703851Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:04:48.6704321Z 
2025-12-04T09:04:48.6704440Z   Version 2023.9.20251020:
2025-12-04T09:04:48.6704827Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:04:48.6705140Z 
2025-12-04T09:04:48.6705277Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:04:48.6705553Z 
2025-12-04T09:04:48.6705655Z     Release notes:
2025-12-04T09:04:48.6706151Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:04:48.6706629Z 
2025-12-04T09:04:48.6706738Z   Version 2023.9.20251027:
2025-12-04T09:04:48.6707124Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:04:48.6707454Z 
2025-12-04T09:04:48.6707591Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:04:48.6707852Z 
2025-12-04T09:04:48.6707969Z     Release notes:
2025-12-04T09:04:48.6708448Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:04:48.6708930Z 
2025-12-04T09:04:48.6709145Z   Version 2023.9.20251105:
2025-12-04T09:04:48.6709520Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:04:48.6709823Z 
2025-12-04T09:04:48.6709970Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:04:48.6710219Z 
2025-12-04T09:04:48.6710317Z     Release notes:
2025-12-04T09:04:48.6710799Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:04:48.6711558Z 
2025-12-04T09:04:48.6711807Z   Version 2023.9.20251110:
2025-12-04T09:04:48.6712274Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:04:48.6712586Z 
2025-12-04T09:04:48.6712714Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:04:48.6713079Z 
2025-12-04T09:04:48.6713172Z     Release notes:
2025-12-04T09:04:48.6713610Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:04:48.6714021Z 
2025-12-04T09:04:48.6714114Z   Version 2023.9.20251117:
2025-12-04T09:04:48.6714453Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:04:48.6714731Z 
2025-12-04T09:04:48.6714863Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:04:48.6715094Z 
2025-12-04T09:04:48.6715182Z     Release notes:
2025-12-04T09:04:48.6715616Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:04:48.6716041Z 
2025-12-04T09:04:48.6716167Z ================================================================================
2025-12-04T09:04:48.7300909Z  
2025-12-04T09:04:48.7301080Z 
2025-12-04T09:04:48.7301183Z Downgraded:
2025-12-04T09:04:48.7301644Z   libnvidia-container-tools-1.17.8-1.x86_64                                     
2025-12-04T09:04:48.7302345Z   libnvidia-container1-1.17.8-1.x86_64                                          
2025-12-04T09:04:48.7303036Z   nvidia-container-toolkit-1.17.8-1.x86_64                                      
2025-12-04T09:04:48.7303746Z   nvidia-container-toolkit-base-1.17.8-1.x86_64                                 
2025-12-04T09:04:48.7304188Z 
2025-12-04T09:04:48.7304287Z Complete!
2025-12-04T09:04:48.8056574Z + sudo systemctl restart docker
2025-12-04T09:04:57.6536460Z Thu Dec  4 09:04:57 2025       
2025-12-04T09:04:57.6537209Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:57.6537854Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:04:57.6538518Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:57.6539152Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:04:57.6539830Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:04:57.6540382Z |                                         |                        |               MIG M. |
2025-12-04T09:04:57.6540793Z |=========================================+========================+======================|
2025-12-04T09:04:57.6932092Z |   0  Tesla T4                       On  |   00000000:00:1B.0 Off |                    0 |
2025-12-04T09:04:57.6932671Z | N/A   30C    P0             24W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:04:57.6933163Z |                                         |                        |                  N/A |
2025-12-04T09:04:57.6933784Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:57.6934324Z |   1  Tesla T4                       On  |   00000000:00:1C.0 Off |                    0 |
2025-12-04T09:04:57.6934835Z | N/A   29C    P0             25W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:04:57.6935283Z |                                         |                        |                  N/A |
2025-12-04T09:04:57.6935766Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:57.6936402Z |   2  Tesla T4                       On  |   00000000:00:1D.0 Off |                    0 |
2025-12-04T09:04:57.6937092Z | N/A   28C    P0             25W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:04:57.6937562Z |                                         |                        |                  N/A |
2025-12-04T09:04:57.6938060Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:57.6938994Z |   3  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:04:57.6939524Z | N/A   29C    P0             25W /   70W |       0MiB /  15360MiB |      9%      Default |
2025-12-04T09:04:57.6939992Z |                                         |                        |                  N/A |
2025-12-04T09:04:57.6940486Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:57.6940861Z 
2025-12-04T09:04:57.6941075Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:57.6941618Z | Processes:                                                                              |
2025-12-04T09:04:57.6942162Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:04:57.6942681Z |        ID   ID                                                               Usage      |
2025-12-04T09:04:57.6943119Z |=========================================================================================|
2025-12-04T09:04:57.6958340Z |  No running processes found                                                             |
2025-12-04T09:04:57.6959355Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:58.0896022Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally
2025-12-04T09:04:58.3181514Z 3.13: Pulling from docker/library/python
2025-12-04T09:04:58.3975888Z 53c88f1dfeb7: Pulling fs layer
2025-12-04T09:04:58.3976405Z eae668646f44: Pulling fs layer
2025-12-04T09:04:58.3976912Z ff2e6e687b6c: Pulling fs layer
2025-12-04T09:04:58.3977284Z 7c40a3faff76: Pulling fs layer
2025-12-04T09:04:58.3977602Z 967a3b1c8fef: Pulling fs layer
2025-12-04T09:04:58.3977934Z a64e1a44f22a: Pulling fs layer
2025-12-04T09:04:58.3978258Z 52655f8a5bcc: Pulling fs layer
2025-12-04T09:04:58.3978561Z 7c40a3faff76: Waiting
2025-12-04T09:04:58.3978870Z 967a3b1c8fef: Waiting
2025-12-04T09:04:58.3979162Z a64e1a44f22a: Waiting
2025-12-04T09:04:58.3979421Z 52655f8a5bcc: Waiting
2025-12-04T09:04:58.5470550Z eae668646f44: Verifying Checksum
2025-12-04T09:04:58.5470937Z eae668646f44: Download complete
2025-12-04T09:04:58.6387423Z 53c88f1dfeb7: Download complete
2025-12-04T09:04:58.7104632Z 967a3b1c8fef: Verifying Checksum
2025-12-04T09:04:58.7105042Z 967a3b1c8fef: Download complete
2025-12-04T09:04:58.7376412Z ff2e6e687b6c: Verifying Checksum
2025-12-04T09:04:58.7377010Z ff2e6e687b6c: Download complete
2025-12-04T09:04:58.7634250Z 52655f8a5bcc: Download complete
2025-12-04T09:04:58.8530843Z a64e1a44f22a: Verifying Checksum
2025-12-04T09:04:58.8531252Z a64e1a44f22a: Download complete
2025-12-04T09:04:59.5804177Z 7c40a3faff76: Verifying Checksum
2025-12-04T09:04:59.5804604Z 7c40a3faff76: Download complete
2025-12-04T09:04:59.8193818Z 53c88f1dfeb7: Pull complete
2025-12-04T09:05:00.3561203Z eae668646f44: Pull complete
2025-12-04T09:05:02.0614429Z ff2e6e687b6c: Pull complete
2025-12-04T09:05:06.9540752Z 7c40a3faff76: Pull complete
2025-12-04T09:05:07.1527067Z 967a3b1c8fef: Pull complete
2025-12-04T09:05:07.7168197Z a64e1a44f22a: Pull complete
2025-12-04T09:05:07.7412467Z 52655f8a5bcc: Pull complete
2025-12-04T09:05:07.7567834Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T09:05:07.7613093Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13
2025-12-04T09:05:16.3600656Z Thu Dec  4 09:05:16 2025       
2025-12-04T09:05:16.3601191Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:05:16.3601824Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:05:16.3602439Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:05:16.3603045Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:05:16.3604157Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:05:16.3604693Z |                                         |                        |               MIG M. |
2025-12-04T09:05:16.3605104Z |=========================================+========================+======================|
2025-12-04T09:05:16.4197960Z |   0  Tesla T4                       On  |   00000000:00:1B.0 Off |                    0 |
2025-12-04T09:05:16.4198655Z | N/A   28C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:05:16.4199151Z |                                         |                        |                  N/A |
2025-12-04T09:05:16.4199643Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:05:16.4200165Z |   1  Tesla T4                       On  |   00000000:00:1C.0 Off |                    0 |
2025-12-04T09:05:16.4200676Z | N/A   28C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:05:16.4201168Z |                                         |                        |                  N/A |
2025-12-04T09:05:16.4201650Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:05:16.4202168Z |   2  Tesla T4                       On  |   00000000:00:1D.0 Off |                    0 |
2025-12-04T09:05:16.4202676Z | N/A   27C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:05:16.4203136Z |                                         |                        |                  N/A |
2025-12-04T09:05:16.4203616Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:05:16.4204131Z |   3  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:05:16.4204638Z | N/A   28C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T09:05:16.4205109Z |                                         |                        |                  N/A |
2025-12-04T09:05:16.4205577Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:05:16.4205938Z 
2025-12-04T09:05:16.4206146Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:05:16.4206671Z | Processes:                                                                              |
2025-12-04T09:05:16.4207210Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:05:16.4207704Z |        ID   ID                                                               Usage      |
2025-12-04T09:05:16.4208125Z |=========================================================================================|
2025-12-04T09:05:16.4224230Z |  No running processes found                                                             |
2025-12-04T09:05:16.4224905Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:05:17.9736049Z Command completed after 1 attempt(s).
2025-12-04T09:05:17.9830461Z Prepare all required actions
2025-12-04T09:05:17.9865764Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T09:05:17.9866240Z with:
2025-12-04T09:05:17.9866970Z   github-token: ***
2025-12-04T09:05:17.9867413Z env:
2025-12-04T09:05:17.9867732Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:17.9868130Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:17.9868652Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:17.9869268Z ##[endgroup]
2025-12-04T09:05:17.9888043Z ##[group]Run set -eux
2025-12-04T09:05:17.9888441Z [36;1mset -eux[0m
2025-12-04T09:05:17.9889017Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T09:05:17.9900082Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:17.9900710Z env:
2025-12-04T09:05:17.9901256Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:17.9901687Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:17.9902286Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:17.9903016Z   GITHUB_TOKEN: ***
2025-12-04T09:05:17.9903388Z ##[endgroup]
2025-12-04T09:05:17.9936442Z + python3 .github/scripts/get_workflow_job_id.py 19922768520 i-035b9d8fd6b020edf
2025-12-04T09:05:19.2302934Z Setting output job-id=57116084904
2025-12-04T09:05:19.2303775Z Setting output job-name=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:19.2418129Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
2025-12-04T09:05:19.2418995Z [36;1mpython3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84[0m
2025-12-04T09:05:19.2420124Z [36;1mpython3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &[0m
2025-12-04T09:05:19.2421374Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:05:19.2427797Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:19.2428226Z env:
2025-12-04T09:05:19.2428458Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:19.2428757Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:19.2429106Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:19.2429497Z   JOB_ID: 57116084904
2025-12-04T09:05:19.2430186Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:19.2430891Z   WORKFLOW_NAME: trunk
2025-12-04T09:05:19.2431175Z   WORKFLOW_RUN_ID: 19922768520
2025-12-04T09:05:19.2431640Z   MONITOR_LOG_INTERVAL: 5
2025-12-04T09:05:19.2431953Z   MONITOR_DATA_COLLECT_INTERVAL: 1
2025-12-04T09:05:19.2432521Z ##[endgroup]
2025-12-04T09:05:19.5611306Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:05:19.9454944Z Collecting psutil==5.9.8
2025-12-04T09:05:19.9623696Z   Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
2025-12-04T09:05:20.0396064Z Collecting dataclasses_json==0.6.7
2025-12-04T09:05:20.0442635Z   Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
2025-12-04T09:05:20.0715469Z Collecting nvidia-ml-py==11.525.84
2025-12-04T09:05:20.0754994Z   Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB)
2025-12-04T09:05:20.2002121Z Collecting marshmallow<4.0.0,>=3.18.0
2025-12-04T09:05:20.2040201Z   Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
2025-12-04T09:05:20.2274690Z Collecting typing-inspect<1,>=0.4.0
2025-12-04T09:05:20.2315735Z   Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
2025-12-04T09:05:20.2894860Z Collecting packaging>=17.0
2025-12-04T09:05:20.2931405Z   Downloading packaging-25.0-py3-none-any.whl (66 kB)
2025-12-04T09:05:20.3478455Z Collecting typing-extensions>=3.7.4
2025-12-04T09:05:20.3520429Z   Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
2025-12-04T09:05:20.3720147Z Collecting mypy-extensions>=0.3.0
2025-12-04T09:05:20.3755332Z   Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
2025-12-04T09:05:20.4815893Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json
2025-12-04T09:05:20.7765326Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0
2025-12-04T09:05:20.9635695Z Prepare all required actions
2025-12-04T09:05:20.9636263Z Getting action download info
2025-12-04T09:05:21.1818447Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:05:21.4467634Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T09:05:21.7482417Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T09:05:21.7482836Z with:
2025-12-04T09:05:21.7483108Z   name: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:21.7483461Z   s3-bucket: gha-artifacts
2025-12-04T09:05:21.7483759Z env:
2025-12-04T09:05:21.7483979Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:21.7484272Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:21.7484630Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:21.7485031Z ##[endgroup]
2025-12-04T09:05:21.7516959Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:05:21.7517332Z with:
2025-12-04T09:05:21.7517626Z   name: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:21.7517972Z   s3-bucket: gha-artifacts
2025-12-04T09:05:21.7518255Z   region: us-east-1
2025-12-04T09:05:21.7518526Z env:
2025-12-04T09:05:21.7518760Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:21.7519067Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:21.7519423Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:21.7519826Z ##[endgroup]
2025-12-04T09:05:22.2508863Z (node:62861) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:05:22.2509510Z 
2025-12-04T09:05:22.2509740Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:05:22.2510357Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:05:22.2510992Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:05:22.4982654Z Found 1 objects with prefix pytorch/pytorch/19922768520/linux-jammy-cuda12.8-py3.10-gcc11/
2025-12-04T09:05:22.4983532Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:05:30.7516386Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:05:30.7521860Z Artifact download has finished successfully
2025-12-04T09:05:30.7720246Z ##[group]Run unzip -o artifacts.zip
2025-12-04T09:05:30.7720654Z [36;1munzip -o artifacts.zip[0m
2025-12-04T09:05:30.7727749Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:30.7728178Z env:
2025-12-04T09:05:30.7728430Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:30.7728744Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:30.7729096Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:30.7729511Z ##[endgroup]
2025-12-04T09:05:30.7805532Z Archive:  artifacts.zip
2025-12-04T09:05:30.7805917Z    creating: dist/
2025-12-04T09:05:30.7942244Z   inflating: dist/.ninja_log         
2025-12-04T09:05:33.3120656Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T09:05:33.3121719Z    creating: build/
2025-12-04T09:05:33.3122040Z    creating: build/custom_test_artifacts/
2025-12-04T09:05:33.3122507Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T09:05:33.3123068Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T09:05:33.3123781Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T09:05:33.3128633Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:05:33.3129435Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T09:05:33.3130208Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:05:33.3131047Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:05:33.3132153Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:05:33.3133110Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:05:33.3134147Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:05:33.3135026Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:05:33.3136013Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:05:33.3137108Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:05:33.3138082Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:05:33.3139438Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:05:33.3140384Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:05:33.3141930Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:05:33.3143839Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:05:33.3144800Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:05:33.3145656Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:05:33.3201753Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:05:33.3258142Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:05:33.3259443Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:05:33.3317838Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:05:33.3319074Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:05:33.3320299Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:05:33.3321957Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:05:33.3323204Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:05:33.3324438Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:05:33.3325672Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:05:33.3326869Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:05:33.3328049Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:05:33.3329171Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:05:33.3330253Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:05:33.3331311Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:05:33.3332384Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:05:33.3333732Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:05:33.3334759Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:05:33.3405944Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:05:33.3406891Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:05:33.3482958Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:05:33.3483907Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T09:05:33.3484607Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T09:05:33.3485349Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:05:33.3486124Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T09:05:33.3486976Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T09:05:33.3487938Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T09:05:33.3488879Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T09:05:33.3489749Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T09:05:33.3490641Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:05:33.3491530Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T09:05:33.3492428Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T09:05:33.3493328Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T09:05:33.3494217Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T09:05:33.3510110Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T09:05:33.3698668Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T09:05:33.3699580Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T09:05:33.3700523Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T09:05:33.3701577Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T09:05:33.3702598Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T09:05:33.3703546Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T09:05:33.3704520Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:05:33.3705498Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T09:05:33.3706479Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T09:05:33.3707467Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T09:05:33.3708429Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T09:05:33.3725121Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T09:05:33.3804041Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T09:05:33.3805283Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:05:33.3806203Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:05:33.3807015Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T09:05:33.3807764Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T09:05:33.3808602Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:05:33.3809358Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2025-12-04T09:05:33.3810045Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T09:05:33.3810685Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T09:05:33.3811335Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T09:05:33.3974849Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T09:05:33.4025582Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T09:05:33.4026233Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T09:05:33.4026798Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T09:05:33.4027476Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T09:05:33.4033791Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:05:33.4034554Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T09:05:33.4035297Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:05:33.4036101Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:05:33.4036870Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:05:33.4037779Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:05:33.4038714Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:05:33.4039573Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:05:33.4040390Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:05:33.4041191Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:05:33.4042281Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:05:33.4043668Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:05:33.4044547Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:05:33.4046062Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:05:33.4047808Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:05:33.4048727Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:05:33.4049528Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:05:33.4104923Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:05:33.4164131Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:05:33.4165382Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:05:33.4222622Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:05:33.4223872Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:05:33.4225118Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:05:33.4226516Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:05:33.4227748Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:05:33.4228931Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:05:33.4230145Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:05:33.4231354Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:05:33.4232524Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:05:33.4233707Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:05:33.4234760Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:05:33.4235788Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:05:33.4236823Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:05:33.4237811Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:05:33.4238832Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:05:33.4312420Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:05:33.4313360Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:05:33.4388680Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:05:33.4389711Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T09:05:33.4390414Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T09:05:33.4391137Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:05:33.4391906Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T09:05:33.4392808Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T09:05:33.4393813Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T09:05:33.4394771Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T09:05:33.4395650Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T09:05:33.4396574Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T09:05:33.4397503Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T09:05:33.4398433Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T09:05:33.4399344Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T09:05:33.4400435Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T09:05:33.4416491Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T09:05:33.4478764Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T09:05:33.4479983Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:05:33.4480870Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:05:33.4481672Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T09:05:33.4482404Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T09:05:33.4483117Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:05:33.4483859Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2025-12-04T09:05:33.4484543Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T09:05:33.4485166Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T09:05:33.4485790Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T09:05:33.4522708Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T09:05:33.4523385Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T09:05:33.4524007Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T09:05:33.4524738Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T09:05:33.4531033Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:05:33.4531881Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T09:05:33.4532732Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:05:33.4533730Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:05:33.4534583Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:05:33.4535566Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:05:33.4536849Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:05:33.4537804Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:05:33.4538741Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:05:33.4539647Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:05:33.4540698Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:05:33.4541753Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:05:33.4542734Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:05:33.4544310Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:05:33.4546006Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:05:33.4547033Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:05:33.4547934Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:05:33.4604002Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:05:33.4659812Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:05:33.4661177Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:05:33.4720671Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:05:33.4722461Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:05:33.4723795Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:05:33.4725163Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:05:33.4726470Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:05:33.4727748Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:05:33.4729037Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:05:33.4730319Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:05:33.4731568Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:05:33.4732736Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:05:33.4733988Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:05:33.4735089Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:05:33.4736279Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:05:33.4737540Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:05:33.4738662Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:05:33.4809030Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:05:33.4810045Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:05:33.4888460Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:05:33.4889459Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T09:05:33.4890223Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T09:05:33.4891008Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:05:33.4891863Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T09:05:33.4892800Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T09:05:33.4893885Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T09:05:33.4894934Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T09:05:33.4896213Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T09:05:33.4897421Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:05:33.4898473Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T09:05:33.4899518Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T09:05:33.4900663Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T09:05:33.4901674Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T09:05:33.4902787Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T09:05:33.5014252Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T09:05:33.5015311Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T09:05:33.5016431Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T09:05:33.5017764Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T09:05:33.5018900Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T09:05:33.5019967Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T09:05:33.5021264Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:05:33.5022369Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T09:05:33.5023481Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T09:05:33.5024579Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T09:05:33.5025646Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T09:05:33.5041366Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T09:05:33.5095523Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T09:05:33.5096796Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:05:33.5097943Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:05:33.5098858Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T09:05:33.5099690Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T09:05:33.5100496Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:05:33.5101302Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2025-12-04T09:05:33.5102077Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T09:05:33.5102789Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T09:05:33.5103509Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T09:05:33.5202813Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T09:05:33.5241883Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T09:05:33.5242479Z    creating: build/lib/
2025-12-04T09:05:33.5321538Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T09:05:33.5744156Z   inflating: build/lib/libprotobuf.a  
2025-12-04T09:05:33.6223067Z   inflating: build/lib/libprotoc.a   
2025-12-04T09:05:33.6232873Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T09:05:33.6240974Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T09:05:33.6248405Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T09:05:33.6249213Z   inflating: build/lib/libclog.a     
2025-12-04T09:05:33.6270523Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T09:05:33.6271054Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T09:05:33.6289974Z   inflating: build/lib/libnnpack.a   
2025-12-04T09:05:33.6466624Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T09:05:33.7284479Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T09:05:33.7353618Z   inflating: build/lib/libgtest.a    
2025-12-04T09:05:33.7370748Z   inflating: build/lib/libgmock.a    
2025-12-04T09:05:33.7371412Z   inflating: build/lib/libgtest_main.a  
2025-12-04T09:05:33.7371802Z   inflating: build/lib/libgmock_main.a  
2025-12-04T09:05:33.7458013Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T09:05:33.7531309Z   inflating: build/lib/libbenchmark.a  
2025-12-04T09:05:33.7531894Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T09:05:33.7532688Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T09:05:33.7540926Z   inflating: build/lib/libittnotify.a  
2025-12-04T09:05:33.7607002Z   inflating: build/lib/libasmjit.a   
2025-12-04T09:05:33.8693980Z   inflating: build/lib/libfbgemm.a   
2025-12-04T09:05:33.8722989Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T09:05:33.9242423Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T09:05:33.9474385Z   inflating: build/lib/libtensorpipe_cuda.a  
2025-12-04T09:05:33.9604410Z   inflating: build/lib/libgloo.a     
2025-12-04T09:05:33.9650418Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T09:05:34.0058318Z   inflating: build/lib/libgloo_cuda.a  
2025-12-04T09:05:34.0744723Z   inflating: build/lib/libonnx.a     
2025-12-04T09:05:34.0765139Z   inflating: build/lib/libfmt.a      
2025-12-04T09:05:35.0414387Z   inflating: build/lib/libdnnl.a     
2025-12-04T09:05:35.0867990Z   inflating: build/lib/libkineto.a   
2025-12-04T09:05:35.0979986Z   inflating: build/lib/libc10.so     
2025-12-04T09:05:35.1028082Z   inflating: build/lib/libc10_cuda.so  
2025-12-04T09:05:35.1029748Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T09:05:35.1031152Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T09:05:38.0651625Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T09:05:38.1418315Z   inflating: build/lib/libtorch_nvshmem.so  
2025-12-04T09:05:41.0512043Z   inflating: build/lib/libtorch_cuda.so  
2025-12-04T09:05:41.0512514Z   inflating: build/lib/libtorch.so   
2025-12-04T09:05:41.0564198Z   inflating: build/lib/libtorch_cuda_linalg.so  
2025-12-04T09:05:41.0632116Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T09:05:41.0652527Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T09:05:41.0677047Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T09:05:41.0701995Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T09:05:41.0704363Z   inflating: build/lib/libc10d_cuda_test.so  
2025-12-04T09:05:41.0708954Z   inflating: build/lib/libshm.so     
2025-12-04T09:05:41.2977917Z   inflating: build/lib/libtorch_python.so  
2025-12-04T09:05:41.3012958Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T09:05:41.3013608Z    creating: build/bin/
2025-12-04T09:05:41.3450415Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T09:05:41.3890707Z   inflating: build/bin/protoc        
2025-12-04T09:05:41.3950966Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T09:05:41.4004312Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T09:05:41.4058622Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T09:05:41.4115156Z   inflating: build/bin/c10_Device_test  
2025-12-04T09:05:41.4178334Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T09:05:41.4236585Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T09:05:41.4290470Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T09:05:41.4352971Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T09:05:41.4410630Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T09:05:41.4471281Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T09:05:41.4523113Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T09:05:41.4583137Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T09:05:41.4658026Z   inflating: build/bin/c10_cow_test  
2025-12-04T09:05:41.4714426Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T09:05:41.4768766Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T09:05:41.4821026Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T09:05:41.4880972Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T09:05:41.4941775Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T09:05:41.4998527Z   inflating: build/bin/c10_Half_test  
2025-12-04T09:05:41.5052310Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T09:05:41.5113473Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T09:05:41.5171526Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T09:05:41.5224722Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T09:05:41.5286052Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T09:05:41.5342113Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T09:05:41.5398545Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T09:05:41.5453306Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T09:05:41.5513062Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T09:05:41.5574338Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T09:05:41.5629702Z   inflating: build/bin/c10_exception_test  
2025-12-04T09:05:41.5683825Z   inflating: build/bin/c10_error_test  
2025-12-04T09:05:41.5743533Z   inflating: build/bin/c10_complex_test  
2025-12-04T09:05:41.5799693Z   inflating: build/bin/c10_flags_test  
2025-12-04T09:05:41.5855786Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T09:05:41.6012505Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T09:05:41.6066605Z   inflating: build/bin/c10_irange_test  
2025-12-04T09:05:41.6123571Z   inflating: build/bin/c10_lazy_test  
2025-12-04T09:05:41.6177681Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T09:05:41.6239501Z   inflating: build/bin/c10_logging_test  
2025-12-04T09:05:41.6319375Z   inflating: build/bin/c10_optional_test  
2025-12-04T09:05:41.6384857Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T09:05:41.6538310Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T09:05:41.6596552Z   inflating: build/bin/c10_registry_test  
2025-12-04T09:05:41.6656967Z   inflating: build/bin/c10_string_util_test  
2025-12-04T09:05:41.6712049Z   inflating: build/bin/c10_ssize_test  
2025-12-04T09:05:41.6766099Z   inflating: build/bin/c10_string_view_test  
2025-12-04T09:05:41.6813512Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T09:05:41.6868296Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T09:05:41.6928050Z   inflating: build/bin/c10_typeid_test  
2025-12-04T09:05:41.6984789Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2025-12-04T09:05:41.7043164Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2025-12-04T09:05:41.7098372Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T09:05:41.7155000Z   inflating: build/bin/c10_cuda_CUDATest  
2025-12-04T09:05:41.7211109Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T09:05:41.7266600Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2025-12-04T09:05:41.7324431Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T09:05:41.7381136Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2025-12-04T09:05:41.7957234Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T09:05:41.8539674Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T09:05:41.9138242Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T09:05:41.9192789Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T09:05:41.9292686Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T09:05:41.9347922Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T09:05:41.9402438Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T09:05:41.9480358Z   inflating: build/bin/Dict_test     
2025-12-04T09:05:41.9537772Z   inflating: build/bin/Dimname_test  
2025-12-04T09:05:41.9606218Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T09:05:41.9667325Z   inflating: build/bin/NamedTensor_test  
2025-12-04T09:05:41.9730169Z   inflating: build/bin/apply_utils_test  
2025-12-04T09:05:41.9794259Z   inflating: build/bin/atest         
2025-12-04T09:05:41.9863982Z   inflating: build/bin/basic         
2025-12-04T09:05:41.9922926Z   inflating: build/bin/broadcast_test  
2025-12-04T09:05:41.9977743Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T09:05:42.0040235Z   inflating: build/bin/cpu_generator_test  
2025-12-04T09:05:42.0096487Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T09:05:42.0193489Z   inflating: build/bin/cpu_rng_test  
2025-12-04T09:05:42.0249623Z   inflating: build/bin/dlconvertor_test  
2025-12-04T09:05:42.0311035Z   inflating: build/bin/extension_backend_test  
2025-12-04T09:05:42.0371144Z   inflating: build/bin/half_test     
2025-12-04T09:05:42.0472032Z   inflating: build/bin/ivalue_test   
2025-12-04T09:05:42.0524502Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T09:05:42.0581261Z   inflating: build/bin/math_kernel_test  
2025-12-04T09:05:42.0639151Z   inflating: build/bin/memory_format_test  
2025-12-04T09:05:42.0695826Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T09:05:42.0754953Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T09:05:42.0813844Z   inflating: build/bin/native_test   
2025-12-04T09:05:42.0871017Z   inflating: build/bin/operator_name_test  
2025-12-04T09:05:42.0924639Z   inflating: build/bin/operators_test  
2025-12-04T09:05:42.0980856Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T09:05:42.1052333Z   inflating: build/bin/pow_test      
2025-12-04T09:05:42.1113118Z   inflating: build/bin/quantized_test  
2025-12-04T09:05:42.1167753Z   inflating: build/bin/reduce_ops_test  
2025-12-04T09:05:42.1221549Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T09:05:42.1282100Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T09:05:42.1344527Z   inflating: build/bin/scalar_test   
2025-12-04T09:05:42.1400823Z   inflating: build/bin/StorageUtils_test  
2025-12-04T09:05:42.1457175Z   inflating: build/bin/stride_properties_test  
2025-12-04T09:05:42.1538922Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T09:05:42.1598108Z   inflating: build/bin/test_parallel  
2025-12-04T09:05:42.1652793Z   inflating: build/bin/thread_init_test  
2025-12-04T09:05:42.1711749Z   inflating: build/bin/type_ptr_test  
2025-12-04T09:05:42.1775338Z   inflating: build/bin/type_test     
2025-12-04T09:05:42.1830226Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T09:05:42.1885635Z   inflating: build/bin/verify_api_visibility  
2025-12-04T09:05:42.1962014Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T09:05:42.2017095Z   inflating: build/bin/weakref_test  
2025-12-04T09:05:42.2072641Z   inflating: build/bin/wrapdim_test  
2025-12-04T09:05:42.2126852Z   inflating: build/bin/xla_tensor_test  
2025-12-04T09:05:42.2192447Z   inflating: build/bin/IListRef_test  
2025-12-04T09:05:42.2297833Z   inflating: build/bin/List_test     
2025-12-04T09:05:42.2369598Z   inflating: build/bin/KernelFunction_test  
2025-12-04T09:05:42.2490837Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T09:05:42.2591051Z   inflating: build/bin/kernel_function_test  
2025-12-04T09:05:42.2718692Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T09:05:42.2821491Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T09:05:42.2887403Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T09:05:42.2985397Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T09:05:42.3041642Z   inflating: build/bin/CppSignature_test  
2025-12-04T09:05:42.3098307Z   inflating: build/bin/backend_fallback_test  
2025-12-04T09:05:42.3154056Z   inflating: build/bin/op_allowlist_test  
2025-12-04T09:05:42.3459406Z   inflating: build/bin/op_registration_test  
2025-12-04T09:05:42.3529451Z   inflating: build/bin/inline_container_test  
2025-12-04T09:05:42.3586671Z   inflating: build/bin/cuda_allocator_test  
2025-12-04T09:05:42.3644022Z   inflating: build/bin/cuda_apply_test  
2025-12-04T09:05:42.3706624Z   inflating: build/bin/cuda_atomic_ops_test  
2025-12-04T09:05:42.3768339Z   inflating: build/bin/cuda_caching_host_allocator_test  
2025-12-04T09:05:42.3842897Z   inflating: build/bin/cuda_complex_math_test  
2025-12-04T09:05:42.3905002Z   inflating: build/bin/cuda_complex_test  
2025-12-04T09:05:42.3974395Z   inflating: build/bin/cuda_cub_test  
2025-12-04T09:05:42.4030310Z   inflating: build/bin/cuda_cublas_handle_pool_test  
2025-12-04T09:05:42.4084155Z   inflating: build/bin/cuda_device_test  
2025-12-04T09:05:42.4164491Z   inflating: build/bin/cuda_distributions_test  
2025-12-04T09:05:42.4217990Z   inflating: build/bin/cuda_dlconvertor_test  
2025-12-04T09:05:42.4276387Z   inflating: build/bin/cuda_event_test  
2025-12-04T09:05:42.4328506Z   inflating: build/bin/cuda_exchange_device_test  
2025-12-04T09:05:42.4391316Z   inflating: build/bin/cuda_generator_test  
2025-12-04T09:05:42.4445096Z   inflating: build/bin/cuda_half_test  
2025-12-04T09:05:42.4497829Z   inflating: build/bin/cuda_allocatorTraceTracker_test  
2025-12-04T09:05:42.4563779Z   inflating: build/bin/cuda_stream_test  
2025-12-04T09:05:42.4618375Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2025-12-04T09:05:42.4673581Z   inflating: build/bin/cuda_cudnn_test  
2025-12-04T09:05:42.4728377Z   inflating: build/bin/cuda_integer_divider_test  
2025-12-04T09:05:42.4781741Z   inflating: build/bin/cuda_optional_test  
2025-12-04T09:05:42.4838923Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2025-12-04T09:05:42.4895782Z   inflating: build/bin/cuda_vectorized_test  
2025-12-04T09:05:42.5975867Z   inflating: build/bin/test_jit      
2025-12-04T09:05:42.6322002Z   inflating: build/bin/test_lazy     
2025-12-04T09:05:42.6378593Z   inflating: build/bin/BackoffTest   
2025-12-04T09:05:42.6436913Z   inflating: build/bin/FileStoreTest  
2025-12-04T09:05:42.6496418Z   inflating: build/bin/TCPStoreTest  
2025-12-04T09:05:42.6555648Z   inflating: build/bin/HashStoreTest  
2025-12-04T09:05:42.6570177Z   inflating: build/bin/ProcessGroupMPITest  
2025-12-04T09:05:42.6571577Z   inflating: build/bin/example_allreduce  
2025-12-04T09:05:42.6630731Z   inflating: build/bin/test_dist_autograd  
2025-12-04T09:05:42.6701707Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T09:05:42.6773888Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T09:05:42.6834954Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2025-12-04T09:05:42.6902114Z   inflating: build/bin/ProcessGroupNCCLTest  
2025-12-04T09:05:42.6969280Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2025-12-04T09:05:42.8117384Z   inflating: build/bin/test_api      
2025-12-04T09:05:42.8118378Z   inflating: build/bin/parallel_benchmark  
2025-12-04T09:05:42.8122624Z   inflating: build/bin/torch_shm_manager  
2025-12-04T09:05:42.8123026Z    creating: .additional_ci_files/
2025-12-04T09:05:42.8186362Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T09:05:42.8413913Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T09:05:42.8440445Z ##[group]Run rm artifacts.zip
2025-12-04T09:05:42.8440768Z [36;1mrm artifacts.zip[0m
2025-12-04T09:05:42.8449641Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:42.8450023Z env:
2025-12-04T09:05:42.8450399Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:42.8450684Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:42.8451001Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:42.8451379Z ##[endgroup]
2025-12-04T09:05:42.9158566Z ##[group]Run df -H
2025-12-04T09:05:42.9158833Z [36;1mdf -H[0m
2025-12-04T09:05:42.9164605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:42.9165163Z env:
2025-12-04T09:05:42.9165399Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:42.9165681Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:42.9166025Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:42.9166417Z ##[endgroup]
2025-12-04T09:05:42.9210213Z Filesystem        Size  Used Avail Use% Mounted on
2025-12-04T09:05:42.9210774Z devtmpfs          4.2M     0  4.2M   0% /dev
2025-12-04T09:05:42.9211164Z tmpfs             101G     0  101G   0% /dev/shm
2025-12-04T09:05:42.9211567Z tmpfs              41G  693k   41G   1% /run
2025-12-04T09:05:42.9211997Z /dev/nvme0n1p1    161G   54G  108G  34% /
2025-12-04T09:05:42.9212348Z tmpfs             101G   17k  101G   1% /tmp
2025-12-04T09:05:42.9212848Z /dev/nvme0n1p128   11M  1.4M  9.2M  13% /boot/efi
2025-12-04T09:05:42.9213256Z tmpfs              21G     0   21G   0% /run/user/0
2025-12-04T09:05:42.9249802Z Prepare all required actions
2025-12-04T09:05:42.9250618Z Getting action download info
2025-12-04T09:05:43.0716848Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T09:05:43.0717219Z with:
2025-12-04T09:05:43.0717434Z env:
2025-12-04T09:05:43.0717807Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.0718101Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.0718474Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.0718854Z ##[endgroup]
2025-12-04T09:05:43.0749968Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:05:43.0750350Z with:
2025-12-04T09:05:43.0750585Z   name: td_results
2025-12-04T09:05:43.0750856Z   s3-bucket: gha-artifacts
2025-12-04T09:05:43.0751136Z   region: us-east-1
2025-12-04T09:05:43.0751384Z env:
2025-12-04T09:05:43.0751620Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.0751901Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.0752249Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.0752706Z ##[endgroup]
2025-12-04T09:05:43.5513013Z (node:62886) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:05:43.5513630Z 
2025-12-04T09:05:43.5513864Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:05:43.5514477Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:05:43.5515113Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:05:43.6614904Z Found 1 objects with prefix pytorch/pytorch/19922768520/td_results/
2025-12-04T09:05:43.6615792Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:05:43.7194862Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:05:43.7198743Z Artifact download has finished successfully
2025-12-04T09:05:43.7366934Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T09:05:43.7367380Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T09:05:43.7367877Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T09:05:43.7374455Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:43.7374883Z env:
2025-12-04T09:05:43.7375128Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.7375415Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.7375766Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.7376285Z ##[endgroup]
2025-12-04T09:05:43.7474464Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T09:05:43.7475065Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T09:05:43.7480782Z shell: /usr/bin/bash -e {0}
2025-12-04T09:05:43.7481076Z env:
2025-12-04T09:05:43.7481322Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.7481625Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.7481967Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.7482377Z ##[endgroup]
2025-12-04T09:05:43.7697667Z Setting output branch=main
2025-12-04T09:05:43.7833904Z Prepare all required actions
2025-12-04T09:05:43.7834310Z Getting action download info
2025-12-04T09:05:43.9254504Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T09:05:43.9255000Z with:
2025-12-04T09:05:43.9255402Z   github-token: ***
2025-12-04T09:05:43.9267370Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:05:43.9278676Z   job-name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:43.9279299Z env:
2025-12-04T09:05:43.9279520Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.9279799Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.9280124Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.9280478Z ##[endgroup]
2025-12-04T09:05:43.9317176Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:05:43.9317469Z with:
2025-12-04T09:05:43.9317667Z   shell: bash
2025-12-04T09:05:43.9317880Z   timeout_minutes: 10
2025-12-04T09:05:43.9318109Z   max_attempts: 5
2025-12-04T09:05:43.9318335Z   retry_wait_seconds: 30
2025-12-04T09:05:43.9319291Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T09:05:43.9320236Z   polling_interval_seconds: 1
2025-12-04T09:05:43.9320504Z   warning_on_retry: true
2025-12-04T09:05:43.9320926Z   continue_on_error: false
2025-12-04T09:05:43.9321349Z env:
2025-12-04T09:05:43.9321564Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:43.9322034Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:43.9322382Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:43.9322948Z   GITHUB_TOKEN: ***
2025-12-04T09:05:43.9323205Z ##[endgroup]
2025-12-04T09:05:44.0324042Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T09:05:44.2817554Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:05:44.4039873Z Collecting requests==2.27.1
2025-12-04T09:05:44.4198358Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T09:05:44.6072912Z Collecting pyyaml==6.0.2
2025-12-04T09:05:44.6115847Z   Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
2025-12-04T09:05:44.6358696Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10)
2025-12-04T09:05:44.6367272Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10)
2025-12-04T09:05:44.6892921Z Collecting certifi>=2017.4.17
2025-12-04T09:05:44.6926352Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T09:05:45.1159359Z Collecting charset-normalizer~=2.0.0
2025-12-04T09:05:45.1198412Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T09:05:45.2131104Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml
2025-12-04T09:05:45.3413514Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1
2025-12-04T09:05:46.0152279Z Command completed after 1 attempt(s).
2025-12-04T09:05:46.0200831Z ##[group]Run set -x
2025-12-04T09:05:46.0201115Z [36;1mset -x[0m
2025-12-04T09:05:46.0201472Z [36;1m[0m
2025-12-04T09:05:46.0202510Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:05:46.0203049Z [36;1m# in runner workspace[0m
2025-12-04T09:05:46.0203461Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T09:05:46.0209745Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:46.0210137Z env:
2025-12-04T09:05:46.0210352Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.0210630Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.0210960Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.0211316Z ##[endgroup]
2025-12-04T09:05:46.0239108Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T09:05:46.0429160Z Setting output branch=main
2025-12-04T09:05:46.0483583Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T09:05:46.0484069Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T09:05:46.0484594Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T09:05:46.0484925Z [36;1m[0m
2025-12-04T09:05:46.0485346Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:05:46.0485882Z [36;1m# in runner workspace[0m
2025-12-04T09:05:46.0486342Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T09:05:46.0486875Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T09:05:46.0487247Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T09:05:46.0498844Z [36;1m  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \[0m
2025-12-04T09:05:46.0510393Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T09:05:46.0510746Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T09:05:46.0511079Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T09:05:46.0511383Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T09:05:46.0511723Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T09:05:46.0512040Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T09:05:46.0517478Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:46.0517881Z env:
2025-12-04T09:05:46.0518094Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.0518372Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.0518696Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.0519283Z   GITHUB_TOKEN: ***
2025-12-04T09:05:46.0519850Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:46.0520487Z   PR_NUMBER: 
2025-12-04T09:05:46.0520712Z   TAG: 
2025-12-04T09:05:46.0521277Z   EVENT_NAME: schedule
2025-12-04T09:05:46.0521549Z   SCHEDULE: 29 8 * * *
2025-12-04T09:05:46.0521990Z   HEAD_BRANCH: main
2025-12-04T09:05:46.0522243Z ##[endgroup]
2025-12-04T09:05:46.0546114Z Workflow: trunk
2025-12-04T09:05:46.0547041Z Job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:46.2471347Z Setting output keep-going=True
2025-12-04T09:05:46.2471788Z Setting output ci-verbose-test-logs=False
2025-12-04T09:05:46.2472416Z Setting output ci-test-showlocals=False
2025-12-04T09:05:46.2472811Z Setting output ci-no-test-timeout=False
2025-12-04T09:05:46.2473181Z Setting output ci-no-td=False
2025-12-04T09:05:46.2473529Z Setting output ci-td-distributed=False
2025-12-04T09:05:46.2473884Z Setting output is-unstable=False
2025-12-04T09:05:46.2474234Z Setting output reenabled-issues=
2025-12-04T09:05:46.2500992Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:05:46.2527207Z Setting output is-test-matrix-empty=False
2025-12-04T09:05:46.2599100Z ##[group]Run echo "Filtered matrix:"
2025-12-04T09:05:46.2599556Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T09:05:46.2625757Z [36;1mecho "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}"[0m
2025-12-04T09:05:46.2649991Z [36;1m[0m
2025-12-04T09:05:46.2650208Z [36;1mecho[0m
2025-12-04T09:05:46.2650493Z [36;1mecho "Is the current job unstable? False"[0m
2025-12-04T09:05:46.2650830Z [36;1m[0m
2025-12-04T09:05:46.2651040Z [36;1mecho[0m
2025-12-04T09:05:46.2651302Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T09:05:46.2651621Z [36;1m[0m
2025-12-04T09:05:46.2651828Z [36;1mecho[0m
2025-12-04T09:05:46.2652072Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T09:05:46.2657980Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:46.2658418Z env:
2025-12-04T09:05:46.2658668Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.2658972Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.2659340Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.2659753Z ##[endgroup]
2025-12-04T09:05:46.2683840Z Filtered matrix:
2025-12-04T09:05:46.2716274Z {include: [{config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}]}
2025-12-04T09:05:46.2743193Z 
2025-12-04T09:05:46.2743342Z Is the current job unstable? False
2025-12-04T09:05:46.2743577Z 
2025-12-04T09:05:46.2743716Z Is keep-going label set? True
2025-12-04T09:05:46.2743938Z 
2025-12-04T09:05:46.2744046Z Reenabled issues? 
2025-12-04T09:05:46.2774990Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T09:05:46.2775712Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:05:46.2781892Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:46.2782324Z env:
2025-12-04T09:05:46.2782574Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.2782886Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.2783251Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.2783654Z   JOB_TIMEOUT: 600
2025-12-04T09:05:46.2783917Z ##[endgroup]
2025-12-04T09:05:46.2832430Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:05:46.2833130Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:05:46.2833639Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:05:46.2839338Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:46.2839749Z env:
2025-12-04T09:05:46.2839992Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.2840295Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.2840637Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.2841038Z ##[endgroup]
2025-12-04T09:05:46.2937940Z ##[group]Run set -x
2025-12-04T09:05:46.2938319Z [36;1mset -x[0m
2025-12-04T09:05:46.2938581Z [36;1m[0m
2025-12-04T09:05:46.2938872Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T09:05:46.2939320Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T09:05:46.2939789Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T09:05:46.2940211Z [36;1m  TEST_COMMAND=.ci/onnx/test.sh[0m
2025-12-04T09:05:46.2940551Z [36;1melse[0m
2025-12-04T09:05:46.2940843Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T09:05:46.2941202Z [36;1mfi[0m
2025-12-04T09:05:46.2941439Z [36;1m[0m
2025-12-04T09:05:46.2941743Z [36;1m# Leaving 1GB for the runner and other things[0m
2025-12-04T09:05:46.2942428Z [36;1mTOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo)[0m
2025-12-04T09:05:46.2943460Z [36;1m# https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap[0m
2025-12-04T09:05:46.2944286Z [36;1m# comes from https://github.com/pytorch/test-infra/pull/6058[0m
2025-12-04T09:05:46.2944913Z [36;1mTOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3))[0m
2025-12-04T09:05:46.2945403Z [36;1m[0m
2025-12-04T09:05:46.2945698Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:05:46.2946103Z [36;1m  SHM_OPTS=[0m
2025-12-04T09:05:46.2946390Z [36;1m  JENKINS_USER=[0m
2025-12-04T09:05:46.2946801Z [36;1m  # ensure that docker container cleanly exits in 12 hours[0m
2025-12-04T09:05:46.2947351Z [36;1m  # if for some reason cleanup action doesn't stop container[0m
2025-12-04T09:05:46.2947942Z [36;1m  # when job is cancelled[0m
2025-12-04T09:05:46.2948301Z [36;1m  DOCKER_SHELL_CMD="sleep 12h"[0m
2025-12-04T09:05:46.2948803Z [36;1m  USED_IMAGE="${DOCKER_IMAGE_S390X}"[0m
2025-12-04T09:05:46.2949131Z [36;1melse[0m
2025-12-04T09:05:46.2949391Z [36;1m  SHM_OPTS="--shm-size=${SHM_SIZE}"[0m
2025-12-04T09:05:46.2949733Z [36;1m  JENKINS_USER="--user jenkins"[0m
2025-12-04T09:05:46.2950060Z [36;1m  DOCKER_SHELL_CMD=[0m
2025-12-04T09:05:46.2950356Z [36;1m  USED_IMAGE="${DOCKER_IMAGE}"[0m
2025-12-04T09:05:46.2950662Z [36;1mfi[0m
2025-12-04T09:05:46.2950862Z [36;1m[0m
2025-12-04T09:05:46.2951214Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T09:05:46.2951777Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T09:05:46.2952404Z [36;1m# Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice[0m
2025-12-04T09:05:46.2952970Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T09:05:46.2953324Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T09:05:46.2953647Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T09:05:46.2953952Z [36;1m  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \[0m
2025-12-04T09:05:46.2954311Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T09:05:46.2954618Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T09:05:46.2954895Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T09:05:46.2955195Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T09:05:46.2955507Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T09:05:46.2955791Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T09:05:46.2956073Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T09:05:46.2956364Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T09:05:46.2956676Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T09:05:46.2956967Z [36;1m  -e JOB_ID \[0m
2025-12-04T09:05:46.2957231Z [36;1m  -e JOB_NAME \[0m
2025-12-04T09:05:46.2957499Z [36;1m  -e BASE_SHA \[0m
2025-12-04T09:05:46.2957751Z [36;1m  -e BRANCH \[0m
2025-12-04T09:05:46.2958009Z [36;1m  -e SHA1 \[0m
2025-12-04T09:05:46.2958272Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T09:05:46.2958566Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T09:05:46.2958854Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T09:05:46.2959138Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T09:05:46.2959413Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T09:05:46.2959806Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T09:05:46.2960131Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T09:05:46.2960459Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T09:05:46.2960756Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T09:05:46.2961059Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T09:05:46.2961352Z [36;1m  -e NO_TD \[0m
2025-12-04T09:05:46.2961607Z [36;1m  -e TD_DISTRIBUTED \[0m
2025-12-04T09:05:46.2961902Z [36;1m  -e PR_LABELS \[0m
2025-12-04T09:05:46.2962209Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T09:05:46.2962547Z [36;1m  -e SCCACHE_BUCKET \[0m
2025-12-04T09:05:46.2962842Z [36;1m  -e SCCACHE_REGION \[0m
2025-12-04T09:05:46.2963133Z [36;1m  -e XLA_CUDA \[0m
2025-12-04T09:05:46.2963421Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2025-12-04T09:05:46.2963796Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T09:05:46.2964183Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T09:05:46.2964572Z [36;1m  -e SKIP_SCCACHE_INITIALIZATION=1 \[0m
2025-12-04T09:05:46.2964917Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T09:05:46.2965259Z [36;1m  -e VLLM_TEST_HUGGING_FACE_TOKEN \[0m
2025-12-04T09:05:46.2965619Z [36;1m  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \[0m
2025-12-04T09:05:46.2965943Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T09:05:46.2966243Z [36;1m  -e ARTIFACTS_FILE_SUFFIX \[0m
2025-12-04T09:05:46.2966618Z [36;1m  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \[0m
2025-12-04T09:05:46.2978205Z [36;1m  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \[0m
2025-12-04T09:05:46.2978798Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T09:05:46.2979279Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T09:05:46.2979805Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T09:05:46.2980150Z [36;1m  --ipc=host \[0m
2025-12-04T09:05:46.2980451Z [36;1m  ${SHM_OPTS} \[0m
2025-12-04T09:05:46.2980732Z [36;1m  --tty \[0m
2025-12-04T09:05:46.2981008Z [36;1m  --detach \[0m
2025-12-04T09:05:46.2981324Z [36;1m  --name="${container_name}" \[0m
2025-12-04T09:05:46.2981677Z [36;1m  ${JENKINS_USER} \[0m
2025-12-04T09:05:46.2982081Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T09:05:46.2982549Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T09:05:46.2982921Z [36;1m  "${USED_IMAGE}" \[0m
2025-12-04T09:05:46.2983227Z [36;1m  ${DOCKER_SHELL_CMD}[0m
2025-12-04T09:05:46.2983535Z [36;1m)[0m
2025-12-04T09:05:46.2983925Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2025-12-04T09:05:46.2984393Z [36;1m[0m
2025-12-04T09:05:46.2984701Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:05:46.2985393Z [36;1m  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt"[0m
2025-12-04T09:05:46.2985992Z [36;1mfi[0m
2025-12-04T09:05:46.2986231Z [36;1m[0m
2025-12-04T09:05:46.2986812Z [36;1mdocker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2025-12-04T09:05:46.2992612Z shell: /usr/bin/bash -e {0}
2025-12-04T09:05:46.2992876Z env:
2025-12-04T09:05:46.2993102Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:46.2993380Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:46.2993691Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:46.2994139Z   BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:46.2994508Z   PR_NUMBER: 
2025-12-04T09:05:46.2994743Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T09:05:46.2995063Z   GITHUB_WORKFLOW: trunk
2025-12-04T09:05:46.2995331Z   GITHUB_JOB: test
2025-12-04T09:05:46.2995565Z   GITHUB_RUN_ID: 19922768520
2025-12-04T09:05:46.2995857Z   GITHUB_RUN_NUMBER: 158165
2025-12-04T09:05:46.2996139Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T09:05:46.2996394Z   JOB_ID: 57116084904
2025-12-04T09:05:46.2996965Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:46.2997696Z   BRANCH: main
2025-12-04T09:05:46.2997972Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:46.2998361Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:46.2998727Z   TEST_CONFIG: distributed
2025-12-04T09:05:46.2998998Z   SHARD_NUMBER: 3
2025-12-04T09:05:46.2999227Z   NUM_TEST_SHARDS: 3
2025-12-04T09:05:46.2999476Z   EXTRA_FLAGS: 
2025-12-04T09:05:46.2999711Z   OP_BENCHMARK_TESTS: 
2025-12-04T09:05:46.2999969Z   REENABLED_ISSUES: 
2025-12-04T09:05:46.3000217Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T09:05:46.3000507Z   VERBOSE_TEST_LOGS: False
2025-12-04T09:05:46.3000783Z   TEST_SHOWLOCALS: False
2025-12-04T09:05:46.3001038Z   NO_TEST_TIMEOUT: False
2025-12-04T09:05:46.3001297Z   NO_TD: False
2025-12-04T09:05:46.3001536Z   TD_DISTRIBUTED: False
2025-12-04T09:05:46.3001846Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2025-12-04T09:05:46.3002218Z   SCCACHE_REGION: us-east-1
2025-12-04T09:05:46.3002489Z   SHM_SIZE: 2g
2025-12-04T09:05:46.3003295Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:46.3004770Z   DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:46.3005672Z   XLA_CUDA: 
2025-12-04T09:05:46.3006037Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:05:46.3006499Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T09:05:46.3006831Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T09:05:46.3007145Z   DASHBOARD_TAG: 
2025-12-04T09:05:46.3007661Z   VLLM_TEST_HUGGING_FACE_TOKEN: ***
2025-12-04T09:05:46.3008084Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T09:05:46.3008501Z   SCRIBE_GRAPHQL_ACCESS_TOKEN: ***
2025-12-04T09:05:46.3009043Z   ARTIFACTS_FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T09:05:46.3009597Z ##[endgroup]
2025-12-04T09:05:46.3032391Z + [[ distributed == \m\u\l\t\i\g\p\u ]]
2025-12-04T09:05:46.3032950Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *onnx* ]]
2025-12-04T09:05:46.3033406Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T09:05:46.3036972Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo
2025-12-04T09:05:46.3056594Z + TOTAL_AVAILABLE_MEMORY_IN_GB='185.682 '
2025-12-04T09:05:46.3057140Z + TOTAL_MEMORY_WITH_SWAP=188
2025-12-04T09:05:46.3057603Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:05:46.3058046Z + SHM_OPTS=--shm-size=2g
2025-12-04T09:05:46.3058346Z + JENKINS_USER='--user jenkins'
2025-12-04T09:05:46.3058671Z + DOCKER_SHELL_CMD=
2025-12-04T09:05:46.3059600Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:46.3065617Z +++ nproc --ignore=2
2025-12-04T09:05:46.3098112Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=46 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=185g --memory-swap=188g --env-file=/tmp/github_env_19922768520 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:59.4075306Z + container_name=9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T09:05:59.4076212Z + echo DOCKER_CONTAINER_ID=9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T09:05:59.4076891Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:05:59.4079447Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T09:05:59.4081852Z + docker exec -t 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh'
2025-12-04T09:05:59.9165181Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f)
2025-12-04T09:06:01.0220301Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T09:06:01.0223228Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T09:06:01.0228214Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T09:06:01.0233449Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T09:06:01.0237527Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T09:06:01.0242370Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T09:06:01.0259003Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0)
2025-12-04T09:06:01.0666246Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4)
2025-12-04T09:06:01.0690745Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T09:06:01.0755663Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T09:06:01.5124548Z Installing collected packages: torch
2025-12-04T09:06:14.8924705Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T09:06:14.9522499Z + export TERM=vt100
2025-12-04T09:06:14.9522851Z + TERM=vt100
2025-12-04T09:06:14.9523119Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:06:14.9530482Z + source .ci/pytorch/common.sh
2025-12-04T09:06:14.9533754Z +++ dirname .ci/pytorch/common.sh
2025-12-04T09:06:14.9540958Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T09:06:14.9542302Z +++ declare -f -t trap_add
2025-12-04T09:06:14.9548198Z ++ set -ex -o pipefail
2025-12-04T09:06:14.9548665Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:06:14.9549054Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T09:06:14.9553339Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:06:14.9559754Z + source .ci/pytorch/common-build.sh
2025-12-04T09:06:14.9561452Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 != *win-* ]]
2025-12-04T09:06:14.9568526Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T09:06:14.9577140Z +++ cd .ci/pytorch
2025-12-04T09:06:14.9577482Z +++ pwd -P
2025-12-04T09:06:14.9578189Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch
2025-12-04T09:06:14.9578683Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-pch* ]]
2025-12-04T09:06:14.9579082Z ++ which sccache
2025-12-04T09:06:14.9593773Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]]
2025-12-04T09:06:14.9594218Z ++ sccache --stop-server
2025-12-04T09:06:14.9618412Z ++ true
2025-12-04T09:06:14.9618742Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T09:06:14.9632691Z ++ trap_add sccache_epilogue EXIT
2025-12-04T09:06:14.9633110Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T09:06:14.9633525Z ++ shift
2025-12-04T09:06:14.9633775Z ++ for trap_add_name in "$@"
2025-12-04T09:06:14.9636504Z ++++ trap -p EXIT
2025-12-04T09:06:14.9638739Z +++ eval 'extract_trap_cmd '
2025-12-04T09:06:14.9639051Z ++++ extract_trap_cmd
2025-12-04T09:06:14.9639318Z ++++ printf '%s\n' ''
2025-12-04T09:06:14.9639611Z +++ printf '%s\n' sccache_epilogue
2025-12-04T09:06:14.9641036Z ++ trap -- '
2025-12-04T09:06:14.9641278Z sccache_epilogue' EXIT
2025-12-04T09:06:14.9641554Z ++ [[ -n 1 ]]
2025-12-04T09:06:14.9641989Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2025-12-04T09:06:14.9642683Z Skipping sccache server initialization, setting environment variables
2025-12-04T09:06:14.9643177Z ++ export SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:06:14.9643504Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:06:14.9643901Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:06:14.9644393Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:06:14.9651195Z ++ export RUST_LOG=sccache::server=error
2025-12-04T09:06:14.9651807Z ++ RUST_LOG=sccache::server=error
2025-12-04T09:06:14.9652135Z ++ sccache --zero-stats
2025-12-04T09:06:15.0857651Z Statistics zeroed.
2025-12-04T09:06:15.0859769Z ++ which ccache
2025-12-04T09:06:15.0872662Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *rocm* ]]
2025-12-04T09:06:15.0873179Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *s390x* ]]
2025-12-04T09:06:15.0873601Z + [[ -d /var/lib/jenkins/workspace ]]
2025-12-04T09:06:15.0874234Z ++ stat -c %u /var/lib/jenkins/workspace
2025-12-04T09:06:15.0890188Z + WORKSPACE_ORIGINAL_OWNER_ID=1000
2025-12-04T09:06:15.0890591Z + trap_add cleanup_workspace EXIT
2025-12-04T09:06:15.0890953Z + trap_add_cmd=cleanup_workspace
2025-12-04T09:06:15.0891255Z + shift
2025-12-04T09:06:15.0891502Z + for trap_add_name in "$@"
2025-12-04T09:06:15.0897665Z +++ trap -p EXIT
2025-12-04T09:06:15.0899965Z ++ eval 'extract_trap_cmd trap -- '\''
2025-12-04T09:06:15.0900327Z sccache_epilogue'\'' EXIT'
2025-12-04T09:06:15.0900661Z +++ extract_trap_cmd trap -- '
2025-12-04T09:06:15.0901005Z sccache_epilogue' EXIT
2025-12-04T09:06:15.0901277Z +++ printf '%s\n' '
2025-12-04T09:06:15.0901548Z sccache_epilogue'
2025-12-04T09:06:15.0901837Z ++ printf '%s\n' cleanup_workspace
2025-12-04T09:06:15.0902528Z + trap -- '
2025-12-04T09:06:15.0902771Z sccache_epilogue
2025-12-04T09:06:15.0903044Z cleanup_workspace' EXIT
2025-12-04T09:06:15.0903390Z + sudo chown -R jenkins /var/lib/jenkins/workspace
2025-12-04T09:06:15.7399571Z + git config --global --add safe.directory /var/lib/jenkins/workspace
2025-12-04T09:06:15.7418120Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:06:15.7419382Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))'
2025-12-04T09:06:16.1808037Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:06:16.1809308Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']'
2025-12-04T09:06:16.1810188Z +++ realpath .ci/pytorch/test.sh
2025-12-04T09:06:16.1818299Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh
2025-12-04T09:06:16.1826378Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch
2025-12-04T09:06:16.1827574Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:06:16.1829124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace
2025-12-04T09:06:16.1829744Z + patch -p4
2025-12-04T09:06:16.1843200Z patching file cudadrv/driver.py
2025-12-04T09:06:16.1843667Z Hunk #1 succeeded at 357 (offset -8 lines).
2025-12-04T09:06:16.1852585Z + popd
2025-12-04T09:06:16.1853071Z ~/workspace
2025-12-04T09:06:16.1853488Z + echo 'Environment variables:'
2025-12-04T09:06:16.1854106Z Environment variables:
2025-12-04T09:06:16.1854520Z + env
2025-12-04T09:06:16.1862992Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:06:16.1863545Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:06:16.1863920Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:06:16.1864653Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:06:16.1865043Z HOSTNAME=9f53f9c599eb
2025-12-04T09:06:16.1866231Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.1867428Z GITHUB_ACTION=__run_3
2025-12-04T09:06:16.1867758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:06:16.1868114Z GITHUB_RUN_NUMBER=158165
2025-12-04T09:06:16.1868530Z TEST_CONFIG=distributed
2025-12-04T09:06:16.1868837Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:06:16.1869209Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:06:16.1869571Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:06:16.1870012Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:06:16.1870347Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:06:16.1870665Z GITHUB_REF_TYPE=branch
2025-12-04T09:06:16.1870983Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.1871361Z XLA_CUDA=
2025-12-04T09:06:16.1871615Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:06:16.1872300Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:06:16.1872715Z ***
2025-12-04T09:06:16.1872959Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:06:16.1873271Z GITHUB_ACTIONS=true
2025-12-04T09:06:16.1873539Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:06:16.1873921Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:06:16.1874370Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.1874787Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.1875373Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main
2025-12-04T09:06:16.1875904Z UCC_HOME=/usr
2025-12-04T09:06:16.1876157Z VERBOSE_TEST_LOGS=False
2025-12-04T09:06:16.1876437Z GITHUB_REF=refs/heads/main
2025-12-04T09:06:16.1876726Z SHARD_NUMBER=3
2025-12-04T09:06:16.1876991Z GITHUB_REF_PROTECTED=true
2025-12-04T09:06:16.1877271Z HOME=/var/lib/jenkins
2025-12-04T09:06:16.1877580Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:06:16.1877950Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:06:16.1878330Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:06:16.1878713Z USE_SYSTEM_NCCL=1
2025-12-04T09:06:16.1878968Z NUM_TEST_SHARDS=3
2025-12-04T09:06:16.1879204Z UCX_HOME=/usr
2025-12-04T09:06:16.1879856Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.1880957Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:06:16.1882018Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.1882936Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:06:16.1883508Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:06:16.1883805Z DASHBOARD_TAG=
2025-12-04T09:06:16.1884094Z GITHUB_RUN_ID=19922768520
2025-12-04T09:06:16.1884372Z INSTALLED_OPENBLAS=
2025-12-04T09:06:16.1885085Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.1885869Z GITHUB_ACTOR=huydhn
2025-12-04T09:06:16.1886117Z PR_NUMBER=
2025-12-04T09:06:16.1886351Z DESIRED_CUDA=12.8.1
2025-12-04T09:06:16.1886612Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:06:16.1886989Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:06:16.1887377Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:06:16.1887770Z TERM=vt100
2025-12-04T09:06:16.1887997Z INSTALLED_VISION=yes
2025-12-04T09:06:16.1888266Z BRANCH=main
2025-12-04T09:06:16.1888516Z SCCACHE_REGION=us-east-1
2025-12-04T09:06:16.1888810Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:06:16.1889128Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:06:16.1889423Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:06:16.1890003Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:06:16.1890671Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:06:16.1891108Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:06:16.1891499Z REENABLED_ISSUES=
2025-12-04T09:06:16.1891735Z DOCS=
2025-12-04T09:06:16.1891952Z SHLVL=1
2025-12-04T09:06:16.1892171Z MAX_JOBS=46
2025-12-04T09:06:16.1892397Z GITHUB_ACTOR_ID=475357
2025-12-04T09:06:16.1892781Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.1893217Z GITHUB_REF_NAME=main
2025-12-04T09:06:16.1893628Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:06:16.1894110Z GITHUB_JOB=test
2025-12-04T09:06:16.1894364Z NO_TEST_TIMEOUT=False
2025-12-04T09:06:16.1894629Z TD_DISTRIBUTED=False
2025-12-04T09:06:16.1894922Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:06:16.1895260Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:06:16.1895547Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:06:16.1895848Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:06:16.1897025Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:06:16.1898078Z GITHUB_BASE_REF=
2025-12-04T09:06:16.1898326Z INSTALLED_ACL=
2025-12-04T09:06:16.1898851Z ARTIFACTS_FILE_SUFFIX=test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T09:06:16.1899453Z CI=true
2025-12-04T09:06:16.1899699Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:06:16.1900071Z RUST_LOG=sccache::server=error
2025-12-04T09:06:16.1900384Z JOB_ID=57116084904
2025-12-04T09:06:16.1900632Z GITHUB_HEAD_REF=
2025-12-04T09:06:16.1900891Z GITHUB_ACTION_REF=
2025-12-04T09:06:16.1901222Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:06:16.1901630Z TEST_SHOWLOCALS=False
2025-12-04T09:06:16.1901918Z GITHUB_WORKFLOW=trunk
2025-12-04T09:06:16.1902218Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:06:16.1902966Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.1903717Z NO_TD=False
2025-12-04T09:06:16.1903982Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:06:16.1904344Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:06:16.1904692Z _=/usr/bin/env
2025-12-04T09:06:16.1905107Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:06:16.1905720Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T09:06:16.2004428Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T09:06:16.2005287Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T09:06:16.2005968Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T09:06:16.2006650Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T09:06:16.2007159Z + BUILD_DIR=build
2025-12-04T09:06:16.2007443Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T09:06:16.2007783Z + BUILD_BIN_DIR=build/bin
2025-12-04T09:06:16.2008063Z + SHARD_NUMBER=3
2025-12-04T09:06:16.2008325Z + NUM_TEST_SHARDS=3
2025-12-04T09:06:16.2008621Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:06:16.2008975Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:06:16.2009271Z + export VALGRIND=ON
2025-12-04T09:06:16.2009535Z + VALGRIND=ON
2025-12-04T09:06:16.2009846Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *clang9* ]]
2025-12-04T09:06:16.2010455Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:06:16.2010846Z + detect_cuda_arch
2025-12-04T09:06:16.2011161Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:06:16.2011543Z + command -v nvidia-smi
2025-12-04T09:06:16.2011827Z /usr/bin/nvidia-smi
2025-12-04T09:06:16.2012159Z ++ nvidia-smi --query-gpu=compute_cap --format=csv
2025-12-04T09:06:16.2012540Z ++ tail -n 1
2025-12-04T09:06:16.2502568Z + TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:06:16.2503160Z + export TORCH_CUDA_ARCH_LIST
2025-12-04T09:06:16.2503788Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *s390x* ]]
2025-12-04T09:06:16.2504366Z + [[ 0 == \1 ]]
2025-12-04T09:06:16.2504623Z + [[ True == \1 ]]
2025-12-04T09:06:16.2504970Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *bazel* ]]
2025-12-04T09:06:16.2506785Z ++ realpath build/custom_test_artifacts
2025-12-04T09:06:16.2514603Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2025-12-04T09:06:16.2515594Z + [[ -n '' ]]
2025-12-04T09:06:16.2516003Z + echo 'Environment variables'
2025-12-04T09:06:16.2516311Z Environment variables
2025-12-04T09:06:16.2516577Z + env
2025-12-04T09:06:16.2523281Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:06:16.2523786Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:06:16.2524175Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:06:16.2524873Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:06:16.2525270Z HOSTNAME=9f53f9c599eb
2025-12-04T09:06:16.2526460Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.2527839Z GITHUB_ACTION=__run_3
2025-12-04T09:06:16.2528670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:06:16.2529306Z GITHUB_RUN_NUMBER=158165
2025-12-04T09:06:16.2529852Z TEST_CONFIG=distributed
2025-12-04T09:06:16.2530405Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:06:16.2531064Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:06:16.2531745Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:06:16.2532627Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:06:16.2533386Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:06:16.2533974Z GITHUB_REF_TYPE=branch
2025-12-04T09:06:16.2534449Z TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:06:16.2535092Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.2535779Z XLA_CUDA=
2025-12-04T09:06:16.2536281Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:06:16.2537579Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:06:16.2538213Z ***
2025-12-04T09:06:16.2538466Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:06:16.2538779Z GITHUB_ACTIONS=true
2025-12-04T09:06:16.2539071Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:06:16.2539480Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:06:16.2539930Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.2540372Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.2540977Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main
2025-12-04T09:06:16.2541522Z UCC_HOME=/usr
2025-12-04T09:06:16.2541790Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:06:16.2542106Z VERBOSE_TEST_LOGS=False
2025-12-04T09:06:16.2542391Z GITHUB_REF=refs/heads/main
2025-12-04T09:06:16.2542688Z SHARD_NUMBER=3
2025-12-04T09:06:16.2542954Z GITHUB_REF_PROTECTED=true
2025-12-04T09:06:16.2543250Z HOME=/var/lib/jenkins
2025-12-04T09:06:16.2543572Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:06:16.2543955Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:06:16.2544353Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:06:16.2544734Z USE_SYSTEM_NCCL=1
2025-12-04T09:06:16.2544998Z NUM_TEST_SHARDS=3
2025-12-04T09:06:16.2545261Z UCX_HOME=/usr
2025-12-04T09:06:16.2545925Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.2547060Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T09:06:16.2548446Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.2549369Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:06:16.2549948Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:06:16.2550246Z DASHBOARD_TAG=
2025-12-04T09:06:16.2550506Z GITHUB_RUN_ID=19922768520
2025-12-04T09:06:16.2550791Z INSTALLED_OPENBLAS=
2025-12-04T09:06:16.2551505Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.2552292Z GITHUB_ACTOR=huydhn
2025-12-04T09:06:16.2552543Z PR_NUMBER=
2025-12-04T09:06:16.2552783Z DESIRED_CUDA=12.8.1
2025-12-04T09:06:16.2553049Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:06:16.2553304Z VALGRIND=ON
2025-12-04T09:06:16.2553562Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:06:16.2553944Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:06:16.2554331Z TERM=vt100
2025-12-04T09:06:16.2554574Z INSTALLED_VISION=yes
2025-12-04T09:06:16.2554845Z BRANCH=main
2025-12-04T09:06:16.2555081Z SCCACHE_REGION=us-east-1
2025-12-04T09:06:16.2555391Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:06:16.2555712Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:06:16.2556010Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:06:16.2556586Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:06:16.2557250Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:06:16.2557652Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:06:16.2558032Z REENABLED_ISSUES=
2025-12-04T09:06:16.2558283Z DOCS=
2025-12-04T09:06:16.2558605Z SHLVL=1
2025-12-04T09:06:16.2558815Z MAX_JOBS=46
2025-12-04T09:06:16.2559057Z GITHUB_ACTOR_ID=475357
2025-12-04T09:06:16.2559440Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:06:16.2559863Z GITHUB_REF_NAME=main
2025-12-04T09:06:16.2560295Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:06:16.2560780Z GITHUB_JOB=test
2025-12-04T09:06:16.2561024Z NO_TEST_TIMEOUT=False
2025-12-04T09:06:16.2561297Z TD_DISTRIBUTED=False
2025-12-04T09:06:16.2561591Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:06:16.2561917Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:06:16.2562216Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:06:16.2562521Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:06:16.2563412Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:06:16.2564328Z GITHUB_BASE_REF=
2025-12-04T09:06:16.2564582Z INSTALLED_ACL=
2025-12-04T09:06:16.2565116Z ARTIFACTS_FILE_SUFFIX=test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T09:06:16.2565708Z CI=true
2025-12-04T09:06:16.2565942Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:06:16.2566296Z RUST_LOG=sccache::server=error
2025-12-04T09:06:16.2566597Z JOB_ID=57116084904
2025-12-04T09:06:16.2566843Z GITHUB_HEAD_REF=
2025-12-04T09:06:16.2567097Z GITHUB_ACTION_REF=
2025-12-04T09:06:16.2567421Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:06:16.2567806Z TEST_SHOWLOCALS=False
2025-12-04T09:06:16.2568086Z GITHUB_WORKFLOW=trunk
2025-12-04T09:06:16.2568377Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:06:16.2569085Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e
2025-12-04T09:06:16.2569824Z NO_TD=False
2025-12-04T09:06:16.2570080Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:06:16.2570410Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:06:16.2570917Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:06:16.2571398Z _=/usr/bin/env
2025-12-04T09:06:16.2571652Z + echo 'Testing pytorch'
2025-12-04T09:06:16.2571926Z Testing pytorch
2025-12-04T09:06:16.2572192Z + export LANG=C.UTF-8
2025-12-04T09:06:16.2572458Z + LANG=C.UTF-8
2025-12-04T09:06:16.2572685Z + PR_NUMBER=
2025-12-04T09:06:16.2573022Z + [[ distributed == \d\e\f\a\u\l\t ]]
2025-12-04T09:06:16.2573392Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T09:06:16.2573789Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:06:16.2574190Z + [[ distributed == \s\l\o\w ]]
2025-12-04T09:06:16.2574595Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *slow-gradcheck* ]]
2025-12-04T09:06:16.2575067Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:06:16.2575494Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:06:16.2575888Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:06:16.2576362Z + [[ distributed == *crossref* ]]
2025-12-04T09:06:16.2576906Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:06:16.2577364Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:06:16.2577826Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:06:16.2578228Z + pip_install ninja==1.10.2
2025-12-04T09:06:16.2578654Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T09:06:16.2579192Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T09:06:16.6466237Z Collecting ninja==1.10.2
2025-12-04T09:06:16.6732712Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T09:06:16.6843716Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T09:06:17.0797498Z Installing collected packages: ninja
2025-12-04T09:06:17.0797921Z   Attempting uninstall: ninja
2025-12-04T09:06:17.0802096Z     Found existing installation: ninja 1.11.1.4
2025-12-04T09:06:17.0825290Z     Uninstalling ninja-1.11.1.4:
2025-12-04T09:06:17.0894313Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T09:06:17.1220355Z Successfully installed ninja-1.10.2
2025-12-04T09:06:17.1775347Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:06:17.1777586Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:06:17.1778752Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:06:17.1779233Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *asan* ]]
2025-12-04T09:06:17.1779706Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-debug* ]]
2025-12-04T09:06:17.1780190Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:06:17.1780854Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass'
2025-12-04T09:06:17.1781688Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass
2025-12-04T09:06:17.1782242Z + cd test
2025-12-04T09:06:17.1782642Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T09:06:18.8854360Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T09:06:18.8854869Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T09:06:18.8855335Z + [[ distributed == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T09:06:18.8855774Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T09:06:18.8856990Z + [[ distributed == *pr_time_benchmarks* ]]
2025-12-04T09:06:18.8857398Z + [[ distributed == *dynamo_eager* ]]
2025-12-04T09:06:18.8857754Z + [[ distributed == *aot_eager* ]]
2025-12-04T09:06:18.8858113Z + [[ distributed == *aot_inductor* ]]
2025-12-04T09:06:18.8858501Z + [[ distributed == *max_autotune_inductor* ]]
2025-12-04T09:06:18.8858883Z + [[ distributed == *inductor* ]]
2025-12-04T09:06:18.8859248Z + [[ distributed == *dynamic* ]]
2025-12-04T09:06:18.8859590Z + [[ distributed == *cpu* ]]
2025-12-04T09:06:18.8859913Z + [[ distributed == *xpu* ]]
2025-12-04T09:06:18.8860258Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T09:06:18.8887856Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:06:18.8888690Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-bazel-* ]]
2025-12-04T09:06:18.8889983Z + cd test
2025-12-04T09:06:18.8890934Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T09:06:21.1148249Z PyTorch built with:
2025-12-04T09:06:21.1148644Z   - GCC 11.4
2025-12-04T09:06:21.1149005Z   - C++ Version: 201703
2025-12-04T09:06:21.1149653Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:06:21.1150484Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:06:21.1150997Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:06:21.1151373Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T09:06:21.1151785Z   - NNPACK is enabled
2025-12-04T09:06:21.1152089Z   - CPU capability usage: AVX512
2025-12-04T09:06:21.1152406Z   - CUDA Runtime 12.8
2025-12-04T09:06:21.1152967Z   - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_89,code=sm_89
2025-12-04T09:06:21.1153611Z   - CuDNN 91.0.2  (built against CUDA 12.9)
2025-12-04T09:06:21.1159207Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T09:06:21.1166026Z 
2025-12-04T09:06:21.5847252Z + cd test
2025-12-04T09:06:21.5847689Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T09:06:23.0129805Z ATen/Parallel:
2025-12-04T09:06:23.0130195Z 	at::get_num_threads() : 24
2025-12-04T09:06:23.0130543Z 	at::get_num_interop_threads() : 24
2025-12-04T09:06:23.0130908Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:06:23.0131290Z 	omp_get_max_threads() : 24
2025-12-04T09:06:23.0131949Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:06:23.0132659Z 	mkl_get_max_threads() : 24
2025-12-04T09:06:23.0133123Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:06:23.0133754Z std::thread::hardware_concurrency() : 48
2025-12-04T09:06:23.0134103Z Environment variables:
2025-12-04T09:06:23.0134395Z 	OMP_NUM_THREADS : [not set]
2025-12-04T09:06:23.0134706Z 	MKL_NUM_THREADS : [not set]
2025-12-04T09:06:23.0135009Z ATen parallel backend: OpenMP
2025-12-04T09:06:23.0135230Z 
2025-12-04T09:06:23.2876977Z + [[ distributed == *numpy_2* ]]
2025-12-04T09:06:23.2877495Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:06:23.2877930Z + [[ distributed == *backward* ]]
2025-12-04T09:06:23.2878325Z + [[ distributed == *libtorch_agnostic_targetting* ]]
2025-12-04T09:06:23.2878723Z + [[ distributed == *xla* ]]
2025-12-04T09:06:23.2879068Z + [[ distributed == *vllm* ]]
2025-12-04T09:06:23.2879392Z + [[ distributed == *executorch* ]]
2025-12-04T09:06:23.2879738Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T09:06:23.2880123Z + [[ distributed == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T09:06:23.2880901Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:06:23.2881395Z + [[ distributed == distributed ]]
2025-12-04T09:06:23.2881710Z + test_distributed
2025-12-04T09:06:23.2882007Z + echo 'Testing distributed python tests'
2025-12-04T09:06:23.2882387Z Testing distributed python tests
2025-12-04T09:06:23.2882841Z + python test/run_test.py --distributed-tests --shard 3 3 --verbose
2025-12-04T09:06:28.8456680Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:06:28.8978084Z Ignoring disabled issues:  ['']
2025-12-04T09:06:28.9084342Z Found test times from artifacts
2025-12-04T09:06:28.9498487Z Found test times from artifacts
2025-12-04T09:06:28.9515237Z Running all tests
2025-12-04T09:06:28.9685408Z Running parallel tests on 1 processes
2025-12-04T09:06:28.9686667Z Name: tests to run (est. time: 140.02min)
2025-12-04T09:06:28.9687030Z   Serial tests (83):
2025-12-04T09:06:28.9687362Z     distributed/test_c10d_functional_native 1/1
2025-12-04T09:06:28.9687777Z     distributed/fsdp/test_fsdp_overlap 1/1
2025-12-04T09:06:28.9688201Z     distributed/fsdp/test_fsdp_pure_fp16 1/1
2025-12-04T09:06:28.9688606Z     distributed/tensor/debug/test_debug_mode 1/1
2025-12-04T09:06:28.9689007Z     distributed/fsdp/test_fsdp_exec_order 1/1
2025-12-04T09:06:28.9689437Z     distributed/fsdp/test_hsdp_dtensor_state_dict 1/1
2025-12-04T09:06:28.9689885Z     distributed/fsdp/test_fsdp_clip_grad_norm 1/1
2025-12-04T09:06:28.9690289Z     distributed/fsdp/test_fsdp_core 2/2
2025-12-04T09:06:28.9690647Z     distributed/algorithms/test_join 1/1
2025-12-04T09:06:28.9691074Z     distributed/pipelining/test_schedule_multiproc 1/1
2025-12-04T09:06:28.9691803Z     distributed/test_compute_comm_reordering 1/1
2025-12-04T09:06:28.9692190Z     distributed/test_cupy_as_tensor 1/1
2025-12-04T09:06:28.9692557Z     distributed/fsdp/test_fsdp_fx 1/1
2025-12-04T09:06:28.9692914Z     distributed/_tools/test_sac_ilp 1/1
2025-12-04T09:06:28.9693294Z     distributed/checkpoint/test_hf_storage 1/1
2025-12-04T09:06:28.9693701Z     distributed/pipelining/test_microbatch 1/1
2025-12-04T09:06:28.9694111Z     distributed/tensor/test_placement_types 1/1
2025-12-04T09:06:28.9694573Z     distributed/tensor/test_dtensor_dispatch_overhead 1/1
2025-12-04T09:06:28.9695120Z     distributed/checkpoint/_experimental/test_checkpoint_reader 1/1
2025-12-04T09:06:28.9695643Z     distributed/checkpoint/test_format_utils 1/1
2025-12-04T09:06:28.9696204Z     distributed/test_aten_comm_compute_reordering 1/2
2025-12-04T09:06:28.9696805Z     distributed/tensor/test_redistribute 2/2
2025-12-04T09:06:28.9697233Z     distributed/tensor/parallel/test_tp_style 1/1
2025-12-04T09:06:28.9697660Z     distributed/tensor/test_api 1/1
2025-12-04T09:06:28.9698020Z     distributed/checkpoint/test_fsspec 1/1
2025-12-04T09:06:28.9698480Z     distributed/tensor/experimental/test_tp_transform 1/1
2025-12-04T09:06:28.9698954Z     distributed/checkpoint/test_traverse 1/1
2025-12-04T09:06:28.9699358Z     distributed/tensor/test_random_ops 1/1
2025-12-04T09:06:28.9699812Z     distributed/_composable/fsdp/test_fully_shard_logging 1/1
2025-12-04T09:06:28.9700283Z     distributed/launcher/test_api 1/1
2025-12-04T09:06:28.9700705Z     distributed/elastic/multiprocessing/test_api 1/1
2025-12-04T09:06:28.9701130Z     distributed/fsdp/test_shard_utils 1/1
2025-12-04T09:06:28.9701560Z     distributed/checkpoint/test_fsdp_optim_state 1/1
2025-12-04T09:06:28.9702054Z     distributed/checkpoint/e2e/test_e2e_save_and_load 1/1
2025-12-04T09:06:28.9702543Z     distributed/checkpoint/test_dtensor_resharding 1/1
2025-12-04T09:06:28.9702982Z     distributed/fsdp/test_fsdp_memory 1/1
2025-12-04T09:06:28.9703397Z     distributed/tensor/test_pointwise_ops 1/1
2025-12-04T09:06:28.9703831Z     distributed/checkpoint/test_compatibility 1/1
2025-12-04T09:06:28.9704246Z     distributed/_tools/test_mem_tracker 1/1
2025-12-04T09:06:28.9704649Z     distributed/elastic/test_control_plane 1/1
2025-12-04T09:06:28.9705174Z     distributed/test_fake_pg 1/1
2025-12-04T09:06:28.9705566Z     distributed/checkpoint/test_fsdp_model_state 1/1
2025-12-04T09:06:28.9706000Z     distributed/test_functional_api 1/1
2025-12-04T09:06:28.9706504Z     distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1
2025-12-04T09:06:28.9707023Z     distributed/tensor/debug/test_comm_mode 1/1
2025-12-04T09:06:28.9707419Z     distributed/test_dist2 1/1
2025-12-04T09:06:28.9707866Z     distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1
2025-12-04T09:06:28.9708342Z     distributed/launcher/test_run 1/1
2025-12-04T09:06:28.9708859Z     distributed/fsdp/test_fsdp_backward_prefetch 1/1
2025-12-04T09:06:28.9709298Z     distributed/checkpoint/test_checkpoint 1/1
2025-12-04T09:06:28.9709703Z     distributed/_pycute/test_coalesce 1/1
2025-12-04T09:06:28.9710075Z     distributed/_pycute/test_complement 1/1
2025-12-04T09:06:28.9710468Z     distributed/_pycute/test_composition 1/1
2025-12-04T09:06:28.9710856Z     distributed/_pycute/test_int_tuple 1/1
2025-12-04T09:06:28.9711234Z     distributed/_pycute/test_left_inverse 1/1
2025-12-04T09:06:28.9711635Z     distributed/_pycute/test_right_inverse 1/1
2025-12-04T09:06:28.9712040Z     distributed/_composable/test_replicate 1/1
2025-12-04T09:06:28.9712453Z     distributed/checkpoint/test_hsdp_checkpoint 1/1
2025-12-04T09:06:28.9712928Z     distributed/tensor/parallel/test_parallelize_api 1/1
2025-12-04T09:06:28.9713372Z     distributed/fsdp/test_fsdp_state_dict 1/2
2025-12-04T09:06:28.9713757Z     distributed/_pycute/test_typing 1/1
2025-12-04T09:06:28.9714119Z     distributed/test_distributed_spawn 1/9
2025-12-04T09:06:28.9714500Z     distributed/test_distributed_spawn 4/9
2025-12-04T09:06:28.9714950Z     distributed/test_distributed_spawn 7/9
2025-12-04T09:06:28.9715318Z     distributed/test_serialization 1/1
2025-12-04T09:06:28.9715719Z     distributed/fsdp/test_fsdp_ignored_modules 1/1
2025-12-04T09:06:28.9716192Z     distributed/_composable/fsdp/test_fully_shard_comm 1/1
2025-12-04T09:06:28.9716670Z     distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1
2025-12-04T09:06:28.9717162Z     distributed/_shard/sharding_plan/test_sharding_plan 1/1
2025-12-04T09:06:28.9717675Z     distributed/_shard/sharded_optim/test_sharded_optim 1/1
2025-12-04T09:06:28.9718209Z     distributed/_composable/fsdp/test_fully_shard_state_dict 1/1
2025-12-04T09:06:28.9718669Z     distributed/tensor/test_utils 1/1
2025-12-04T09:06:28.9719106Z     distributed/_composable/fsdp/test_fully_shard_memory 1/1
2025-12-04T09:06:28.9719577Z     distributed/checkpoint/test_state_dict 1/1
2025-12-04T09:06:28.9719994Z     distributed/checkpoint/test_state_dict_utils 1/1
2025-12-04T09:06:28.9720413Z     distributed/rpc/test_faulty_agent 1/1
2025-12-04T09:06:28.9721079Z     distributed/_shard/sharded_tensor/ops/test_embedding 1/1
2025-12-04T09:06:28.9721813Z     distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1
2025-12-04T09:06:28.9722324Z     distributed/test_c10d_spawn_nccl 1/1
2025-12-04T09:06:28.9722703Z     distributed/test_c10d_spawn_ucc 1/1
2025-12-04T09:06:28.9723080Z     distributed/test_c10d_gloo 1/2
2025-12-04T09:06:28.9723505Z     distributed/_shard/sharded_tensor/test_sharded_tensor 1/1
2025-12-04T09:06:28.9723968Z     distributed/test_c10d_nccl 3/3
2025-12-04T09:06:28.9724309Z   Parallel tests (0):
2025-12-04T09:06:28.9724598Z Name: excluded (est. time: 0.0min)
2025-12-04T09:06:28.9724930Z   Serial tests (0):
2025-12-04T09:06:28.9725205Z   Parallel tests (0):
2025-12-04T09:06:28.9725750Z Running distributed/test_c10d_functional_native 1/1 ... [2025-12-04 09:06:28.969400][820.577319465]
2025-12-04T09:06:28.9726388Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:06:28.9727720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_functional_native.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:06:28.969799]
2025-12-04T09:10:58.2980715Z 
2025-12-04T09:10:58.2982314Z distributed/test_c10d_functional_native 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_functional_native_1.1_5ceb4f282067967e_.log
2025-12-04T09:10:58.2999052Z Running 33 items in this shard: test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_to_all_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_broadcast, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_fixed_striding, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_functional_collectives_inference_mode, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_inductor_dtypeview_memory_leak, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_out, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_threading, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_unwaited, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_wait_tensor, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_collectives, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_wait_tensor, test/distributed/test_c10d_functional_native.py::CompileTestCPU::test_inductor_all_reduce_cpu, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_non_contig_input, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_to_all_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_broadcast, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_inplace_op_on_view, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reuse_buffer_after_inplace_collective, test/distributed/test_c10d_functional_native.py::CompileTest::test_ranks_and_tag, test/distributed/test_c10d_functional_native.py::CompileTest::test_wait_tensor
2025-12-04T09:10:58.3015624Z 
2025-12-04T09:10:58.3016059Z Finished distributed/test_c10d_functional_native 1/1 ... [2025-12-04 09:10:58.297414][1089.905329516], took 4.49min
2025-12-04T09:10:58.3017880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.xml
2025-12-04T09:10:58.7216419Z Uploading artifacts took 0.13 seconds
2025-12-04T09:10:58.7217812Z Running distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 09:10:58.721573][1090.329489371]
2025-12-04T09:10:58.7218480Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:10:58.7222844Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_overlap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:10:58.721945]
2025-12-04T09:11:56.7673575Z 
2025-12-04T09:11:56.7674530Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log)
2025-12-04T09:11:56.7675948Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml
2025-12-04T09:11:56.7676907Z ============================= test session starts ==============================
2025-12-04T09:11:56.7677596Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:11:56.7678176Z cachedir: .pytest_cache
2025-12-04T09:11:56.7678873Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:11:56.7679654Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:11:56.7692068Z configfile: pytest.ini
2025-12-04T09:11:56.7692992Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:11:56.7693784Z collecting ... collected 1 item
2025-12-04T09:11:56.7694188Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:11:56.7695115Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T09:11:56.7695823Z 
2025-12-04T09:11:56.7697126Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:02.144000 14371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14423
2025-12-04T09:11:56.7699658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:11:56.7700938Z   _init_core_state(
2025-12-04T09:11:56.7703233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:11:56.7705446Z   _warn_cpu_init()
2025-12-04T09:11:56.7706052Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:11:56.7707165Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:11:56.7708812Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7710516Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:11:56.7712075Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7713519Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:11:56.7715254Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7716821Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7718384Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7719945Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7722076Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7723668Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:11:56.7725335Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7726944Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:11:56.7729280Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7731489Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7732638Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7734508Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7736084Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7737541Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7738948Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:11:56.7739747Z dist init r=0, world=1
2025-12-04T09:11:56.7739926Z 
2025-12-04T09:11:56.7740033Z rank0:
2025-12-04T09:11:56.7740668Z   e1: {'cpu_iter': 0.0018478633000004407, 'cpu_wait': 2.8651499999376995e-05, 'gpu_compute': 0.010128000122494995, 'gpu_total': 0.8250816106796265}
2025-12-04T09:11:56.7741787Z   e2: {'cpu_iter': 0.004702577100000127, 'cpu_wait': 3.239280000002509e-05, 'gpu_compute': 0.13931520022451876, 'gpu_total': 2.1374751806259153}
2025-12-04T09:11:56.7742861Z   e3: {'cpu_iter': 0.0019113367999999298, 'cpu_wait': 0.15016560569999998, 'gpu_compute': 152.52830657958984, 'gpu_total': 152.9109375}
2025-12-04T09:11:56.7743908Z   e4: {'cpu_iter': 0.004734771799999926, 'cpu_wait': 0.1485989104000005, 'gpu_compute': 152.59314613342286, 'gpu_total': 153.189501953125}
2025-12-04T09:11:56.7745693Z [rank0]:[W1204 09:11:13.102715704 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:11:56.7747189Z FAILED [13.2049s] [100%]
2025-12-04T09:11:56.7747393Z 
2025-12-04T09:11:56.7747547Z =================================== FAILURES ===================================
2025-12-04T09:11:56.7748172Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T09:11:56.7748859Z Traceback (most recent call last):
2025-12-04T09:11:56.7749684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:11:56.7750401Z     self._join_processes(fn)
2025-12-04T09:11:56.7751124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:11:56.7751899Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:11:56.7752693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:11:56.7753472Z     raise RuntimeError(error)
2025-12-04T09:11:56.7753886Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7754320Z Traceback (most recent call last):
2025-12-04T09:11:56.7755020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7755737Z     getattr(self, test_name)()
2025-12-04T09:11:56.7756397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7757087Z     fn()
2025-12-04T09:11:56.7757668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7758433Z     method(*args, **kwargs)
2025-12-04T09:11:56.7759065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7759752Z     method(*args, **kwargs)
2025-12-04T09:11:56.7760397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7761057Z     with policy():
2025-12-04T09:11:56.7761867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7762596Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7763918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7765160Z 
2025-12-04T09:11:56.7765366Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7766321Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7767080Z 
2025-12-04T09:11:56.7767339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7767720Z 
2025-12-04T09:11:56.7767725Z 
2025-12-04T09:11:56.7767950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:11:56.7768551Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:11:56.7769712Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml -
2025-12-04T09:11:56.7770794Z =========================== short test summary info ============================
2025-12-04T09:11:56.7771902Z FAILED [13.2049s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7772942Z Traceback (most recent call last):
2025-12-04T09:11:56.7773845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7774573Z     getattr(self, test_name)()
2025-12-04T09:11:56.7775255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7775934Z     fn()
2025-12-04T09:11:56.7776584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7777495Z     method(*args, **kwargs)
2025-12-04T09:11:56.7778201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7778968Z     method(*args, **kwargs)
2025-12-04T09:11:56.7779671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7780412Z     with policy():
2025-12-04T09:11:56.7781100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7781875Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7783262Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7784592Z 
2025-12-04T09:11:56.7784809Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7785822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7786697Z 
2025-12-04T09:11:56.7786964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7787556Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:11:56.7788029Z ============================== 1 failed in 13.42s ==============================
2025-12-04T09:11:56.7788430Z Got exit code 1
2025-12-04T09:11:56.7788808Z Retrying single test...
2025-12-04T09:11:56.7789725Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml
2025-12-04T09:11:56.7790639Z ============================= test session starts ==============================
2025-12-04T09:11:56.7791266Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:11:56.7791829Z cachedir: .pytest_cache
2025-12-04T09:11:56.7792489Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:11:56.7793225Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:11:56.7793561Z configfile: pytest.ini
2025-12-04T09:11:56.7794237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:11:56.7795171Z collecting ... collected 1 item
2025-12-04T09:11:56.7796107Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T09:11:56.7797061Z Running 1 items in this shard
2025-12-04T09:11:56.7797263Z 
2025-12-04T09:11:56.7798281Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:19.414000 14494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14546
2025-12-04T09:11:56.7800427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:11:56.7801655Z   _init_core_state(
2025-12-04T09:11:56.7803847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:11:56.7806129Z   _warn_cpu_init()
2025-12-04T09:11:56.7806794Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:11:56.7807816Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:11:56.7809319Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7810795Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:11:56.7812259Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7813783Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:11:56.7815203Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7817017Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7818622Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7820219Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7822012Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7823574Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:11:56.7825141Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7826754Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:11:56.7829024Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7831181Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7832363Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7834456Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7836022Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7837203Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7838574Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:11:56.7839348Z dist init r=0, world=1
2025-12-04T09:11:56.7839520Z 
2025-12-04T09:11:56.7839621Z rank0:
2025-12-04T09:11:56.7840220Z   e1: {'cpu_iter': 0.0021133577999997042, 'cpu_wait': 3.100590000038039e-05, 'gpu_compute': 0.009472000156529247, 'gpu_total': 0.8031007945537567}
2025-12-04T09:11:56.7841305Z   e2: {'cpu_iter': 0.005042445299999976, 'cpu_wait': 3.310709999997386e-05, 'gpu_compute': 0.13894400056451559, 'gpu_total': 2.1602207899093626}
2025-12-04T09:11:56.7842367Z   e3: {'cpu_iter': 0.0021608666999997084, 'cpu_wait': 0.14332368769999987, 'gpu_compute': 146.0892475128174, 'gpu_total': 146.47476654052736}
2025-12-04T09:11:56.7843409Z   e4: {'cpu_iter': 0.005032401400000275, 'cpu_wait': 0.14177950029999967, 'gpu_compute': 146.0942470550537, 'gpu_total': 146.6829818725586}
2025-12-04T09:11:56.7845126Z [rank0]:[W1204 09:11:30.083323253 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:11:56.7846548Z FAILED [12.7267s] [100%]
2025-12-04T09:11:56.7846739Z 
2025-12-04T09:11:56.7846886Z =================================== FAILURES ===================================
2025-12-04T09:11:56.7847489Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T09:11:56.7848044Z Traceback (most recent call last):
2025-12-04T09:11:56.7848815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:11:56.7849592Z     self._join_processes(fn)
2025-12-04T09:11:56.7850368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:11:56.7851205Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:11:56.7852062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:11:56.7852907Z     raise RuntimeError(error)
2025-12-04T09:11:56.7853345Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7853815Z Traceback (most recent call last):
2025-12-04T09:11:56.7854583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7855357Z     getattr(self, test_name)()
2025-12-04T09:11:56.7856078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7857079Z     fn()
2025-12-04T09:11:56.7857739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7858505Z     method(*args, **kwargs)
2025-12-04T09:11:56.7859210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7859985Z     method(*args, **kwargs)
2025-12-04T09:11:56.7860696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7861434Z     with policy():
2025-12-04T09:11:56.7862185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7862964Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7864357Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7865675Z 
2025-12-04T09:11:56.7865893Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7866912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7867720Z 
2025-12-04T09:11:56.7868106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7868497Z 
2025-12-04T09:11:56.7868501Z 
2025-12-04T09:11:56.7868726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:11:56.7869420Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:11:56.7870534Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml -
2025-12-04T09:11:56.7871562Z =========================== short test summary info ============================
2025-12-04T09:11:56.7872594Z FAILED [12.7267s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7873654Z Traceback (most recent call last):
2025-12-04T09:11:56.7874367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7875088Z     getattr(self, test_name)()
2025-12-04T09:11:56.7875764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7876461Z     fn()
2025-12-04T09:11:56.7877049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7877733Z     method(*args, **kwargs)
2025-12-04T09:11:56.7878363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7879049Z     method(*args, **kwargs)
2025-12-04T09:11:56.7879692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7880360Z     with policy():
2025-12-04T09:11:56.7880973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7881661Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7882909Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7884072Z 
2025-12-04T09:11:56.7884268Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7885175Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7885887Z 
2025-12-04T09:11:56.7886122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7886653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:11:56.7887067Z ============================== 1 failed in 12.94s ==============================
2025-12-04T09:11:56.7887421Z Got exit code 1
2025-12-04T09:11:56.7887715Z Retrying single test...
2025-12-04T09:11:56.7888464Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml
2025-12-04T09:11:56.7889315Z ============================= test session starts ==============================
2025-12-04T09:11:56.7889903Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:11:56.7890438Z cachedir: .pytest_cache
2025-12-04T09:11:56.7891057Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:11:56.7891751Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:11:56.7892073Z configfile: pytest.ini
2025-12-04T09:11:56.7892708Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:11:56.7893420Z collecting ... collected 1 item
2025-12-04T09:11:56.7894284Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T09:11:56.7895346Z Running 1 items in this shard
2025-12-04T09:11:56.7895543Z 
2025-12-04T09:11:56.7896605Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:36.214000 14617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14669
2025-12-04T09:11:56.7898996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:11:56.7900335Z   _init_core_state(
2025-12-04T09:11:56.7902536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:11:56.7904789Z   _warn_cpu_init()
2025-12-04T09:11:56.7905412Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:11:56.7906554Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:11:56.7908249Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7909966Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:11:56.7911598Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7912948Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:11:56.7914296Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7915719Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7917196Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7918623Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:11:56.7920040Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7921778Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:11:56.7923346Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7924968Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:11:56.7927253Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7929384Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7930565Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7932609Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7934362Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:11:56.7935462Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7936941Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:11:56.7937741Z dist init r=0, world=1
2025-12-04T09:11:56.7937921Z 
2025-12-04T09:11:56.7938021Z rank0:
2025-12-04T09:11:56.7938649Z   e1: {'cpu_iter': 0.0018660827999998019, 'cpu_wait': 2.8354200000002548e-05, 'gpu_compute': 0.009180800034664571, 'gpu_total': 0.8075520098209381}
2025-12-04T09:11:56.7939771Z   e2: {'cpu_iter': 0.004684944300000282, 'cpu_wait': 3.31210999995335e-05, 'gpu_compute': 0.14296319913119077, 'gpu_total': 2.1671871900558473}
2025-12-04T09:11:56.7940860Z   e3: {'cpu_iter': 0.0018834125999999784, 'cpu_wait': 0.22940703660000014, 'gpu_compute': 231.5696128845215, 'gpu_total': 231.934912109375}
2025-12-04T09:11:56.7941930Z   e4: {'cpu_iter': 0.004714327500000693, 'cpu_wait': 0.2277740248999999, 'gpu_compute': 231.66344566345214, 'gpu_total': 232.25556335449218}
2025-12-04T09:11:56.7943710Z [rank0]:[W1204 09:11:50.346849138 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:11:56.7945092Z FAILED [16.2794s] [100%]
2025-12-04T09:11:56.7945292Z 
2025-12-04T09:11:56.7945442Z =================================== FAILURES ===================================
2025-12-04T09:11:56.7946056Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T09:11:56.7946635Z Traceback (most recent call last):
2025-12-04T09:11:56.7947500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:11:56.7948296Z     self._join_processes(fn)
2025-12-04T09:11:56.7949196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:11:56.7950072Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:11:56.7950859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:11:56.7951628Z     raise RuntimeError(error)
2025-12-04T09:11:56.7952031Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7952466Z Traceback (most recent call last):
2025-12-04T09:11:56.7953167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7953878Z     getattr(self, test_name)()
2025-12-04T09:11:56.7954544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7955233Z     fn()
2025-12-04T09:11:56.7955816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7956491Z     method(*args, **kwargs)
2025-12-04T09:11:56.7957117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7957792Z     method(*args, **kwargs)
2025-12-04T09:11:56.7958427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7959150Z     with policy():
2025-12-04T09:11:56.7959759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7960445Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7961671Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7962838Z 
2025-12-04T09:11:56.7963034Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7963913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7964612Z 
2025-12-04T09:11:56.7964845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7965199Z 
2025-12-04T09:11:56.7965203Z 
2025-12-04T09:11:56.7965402Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:11:56.7965939Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:11:56.7967026Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml -
2025-12-04T09:11:56.7968036Z =========================== short test summary info ============================
2025-12-04T09:11:56.7969055Z FAILED [16.2794s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:11:56.7970018Z Traceback (most recent call last):
2025-12-04T09:11:56.7970706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:11:56.7971411Z     getattr(self, test_name)()
2025-12-04T09:11:56.7972080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:11:56.7972763Z     fn()
2025-12-04T09:11:56.7973414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7974096Z     method(*args, **kwargs)
2025-12-04T09:11:56.7974733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:11:56.7975393Z     method(*args, **kwargs)
2025-12-04T09:11:56.7976030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:11:56.7976947Z     with policy():
2025-12-04T09:11:56.7977668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:11:56.7978452Z     raise RuntimeError(msg)
2025-12-04T09:11:56.7979857Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336.
2025-12-04T09:11:56.7981179Z 
2025-12-04T09:11:56.7981408Z To execute this test, run the following from the base repo dir:
2025-12-04T09:11:56.7982416Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T09:11:56.7983210Z 
2025-12-04T09:11:56.7983475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:11:56.7984063Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:11:56.7984545Z ============================== 1 failed in 16.49s ==============================
2025-12-04T09:11:56.7984997Z Got exit code 1
2025-12-04T09:11:56.7985738Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T09:11:56.7986864Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:11:56.7988068Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml
2025-12-04T09:11:56.7989128Z ============================= test session starts ==============================
2025-12-04T09:11:56.7989719Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:11:56.7990254Z cachedir: .pytest_cache
2025-12-04T09:11:56.7990887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:11:56.7991578Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:11:56.7991894Z configfile: pytest.ini
2025-12-04T09:11:56.7992548Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:11:56.7993323Z collecting ... collected 1 item / 1 deselected / 0 selected
2025-12-04T09:11:56.7993753Z stepcurrent: skipping 1 already run items.
2025-12-04T09:11:56.7994097Z Running 0 items in this shard
2025-12-04T09:11:56.7994283Z 
2025-12-04T09:11:56.7995042Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml -
2025-12-04T09:11:56.7996040Z ============================ 1 deselected in 0.01s =============================
2025-12-04T09:11:56.7996929Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda']
2025-12-04T09:11:56.7997667Z 
2025-12-04T09:11:56.7998234Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log)
2025-12-04T09:11:56.7998922Z 
2025-12-04T09:11:56.7999342Z Finished distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 09:11:56.767099][1148.375015758], took 0.97min
2025-12-04T09:11:56.8000605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml
2025-12-04T09:11:56.8634503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml
2025-12-04T09:11:56.8953703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml
2025-12-04T09:11:56.9373730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml
2025-12-04T09:11:57.1134303Z Uploading logs for 57116084904 to S3
2025-12-04T09:11:57.1510844Z Uploading artifacts took 0.18 seconds
2025-12-04T09:11:57.1511334Z distributed/fsdp/test_fsdp_overlap 1/1 failed!
2025-12-04T09:11:57.1515777Z Running distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 09:11:57.151133][1148.759050119]
2025-12-04T09:11:57.1516371Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:11:57.1517642Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_pure_fp16.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:11:57.151443]
2025-12-04T09:13:23.6624039Z 
2025-12-04T09:13:23.6626569Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log)
2025-12-04T09:13:23.6630252Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml
2025-12-04T09:13:23.6631266Z ============================= test session starts ==============================
2025-12-04T09:13:23.6631949Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.6632547Z cachedir: .pytest_cache
2025-12-04T09:13:23.6633274Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.6634068Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.6634411Z configfile: pytest.ini
2025-12-04T09:13:23.6635133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.6635943Z collecting ... collected 2 items
2025-12-04T09:13:23.6636366Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:13:23.6637632Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda, test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda
2025-12-04T09:13:23.6638709Z 
2025-12-04T09:13:23.6639607Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:00.594000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14848
2025-12-04T09:13:23.6641136Z I1204 09:12:00.595000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 14849
2025-12-04T09:13:23.6642278Z I1204 09:12:00.596000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 14850
2025-12-04T09:13:23.6643417Z I1204 09:12:00.596000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 14851
2025-12-04T09:13:23.6645955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6648005Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6650036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6652057Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6654082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6656092Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6658237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6660336Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6661656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.6662917Z   return func(*args, **kwargs)
2025-12-04T09:13:23.6663611Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6664745Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6666439Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6668091Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6669832Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6671324Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6672780Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6674327Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6675879Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6677428Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6679582Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6681103Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6682628Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6684186Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6686264Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304.
2025-12-04T09:13:23.6688205Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6689332Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6691055Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6692545Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6693739Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6695093Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.6696208Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6697598Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6699289Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6700959Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6702607Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6704143Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6706178Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6707795Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6709510Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6711148Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6712705Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6714211Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6715729Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6717281Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6719363Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 609157120 and is now 640614400.
2025-12-04T09:13:23.6721696Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6722874Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6724656Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6726264Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6727497Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6728913Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.6730065Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6731199Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6733008Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6734709Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6736263Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6738008Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6739518Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6741134Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6742852Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6744460Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6746072Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6747619Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6749246Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6750690Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6752594Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 604962816 and is now 640614400.
2025-12-04T09:13:23.6754370Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6755405Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6757054Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6758368Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6759467Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6760718Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.6761729Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6762749Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6764256Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6765961Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6767494Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6768946Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6770368Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6771930Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6773434Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6774924Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6776521Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6778285Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6779857Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6781463Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6783592Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.6785590Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6786839Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6788632Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6790069Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6791156Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6792407Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.6793124Z dist init r=0, world=4
2025-12-04T09:13:23.6793386Z dist init r=3, world=4
2025-12-04T09:13:23.6793629Z dist init r=2, world=4
2025-12-04T09:13:23.6793880Z dist init r=1, world=4
2025-12-04T09:13:23.6795088Z [rank0]:[W1204 09:12:07.428003541 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.6796310Z FAILED [8.6219s] [ 50%]
2025-12-04T09:13:23.6796643Z 
2025-12-04T09:13:23.6796788Z =================================== FAILURES ===================================
2025-12-04T09:13:23.6797308Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________
2025-12-04T09:13:23.6797794Z Traceback (most recent call last):
2025-12-04T09:13:23.6798528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.6799289Z     self._join_processes(fn)
2025-12-04T09:13:23.6800043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.6800863Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.6801742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.6802560Z     raise RuntimeError(error)
2025-12-04T09:13:23.6802986Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:13:23.6803444Z Traceback (most recent call last):
2025-12-04T09:13:23.6804183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6804938Z     getattr(self, test_name)()
2025-12-04T09:13:23.6805652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6806379Z     fn()
2025-12-04T09:13:23.6806990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6807707Z     method(*args, **kwargs)
2025-12-04T09:13:23.6808452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6809134Z     method(*args, **kwargs)
2025-12-04T09:13:23.6810002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6810776Z     with policy():
2025-12-04T09:13:23.6811376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6812232Z     raise RuntimeError(msg)
2025-12-04T09:13:23.6813855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304.
2025-12-04T09:13:23.6815049Z 
2025-12-04T09:13:23.6815269Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6816095Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6816799Z 
2025-12-04T09:13:23.6817232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6817653Z 
2025-12-04T09:13:23.6817658Z 
2025-12-04T09:13:23.6817884Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.6818517Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.6819764Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml -
2025-12-04T09:13:23.6821127Z =========================== short test summary info ============================
2025-12-04T09:13:23.6822169Z FAILED [8.6219s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:13:23.6823121Z Traceback (most recent call last):
2025-12-04T09:13:23.6823904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6824717Z     getattr(self, test_name)()
2025-12-04T09:13:23.6825477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6826256Z     fn()
2025-12-04T09:13:23.6826896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6827663Z     method(*args, **kwargs)
2025-12-04T09:13:23.6828384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6829137Z     method(*args, **kwargs)
2025-12-04T09:13:23.6829973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6830736Z     with policy():
2025-12-04T09:13:23.6831717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6832635Z     raise RuntimeError(msg)
2025-12-04T09:13:23.6833873Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304.
2025-12-04T09:13:23.6835015Z 
2025-12-04T09:13:23.6835241Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6836104Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6836743Z 
2025-12-04T09:13:23.6837005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6837591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.6838063Z ============================== 1 failed in 8.83s ===============================
2025-12-04T09:13:23.6838442Z Got exit code 1
2025-12-04T09:13:23.6838712Z Retrying single test...
2025-12-04T09:13:23.6839558Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml
2025-12-04T09:13:23.6840515Z ============================= test session starts ==============================
2025-12-04T09:13:23.6841150Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.6841864Z cachedir: .pytest_cache
2025-12-04T09:13:23.6842557Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.6843321Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.6843662Z configfile: pytest.ini
2025-12-04T09:13:23.6844374Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.6845239Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T09:13:23.6846276Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda
2025-12-04T09:13:23.6847113Z Running 1 items in this shard
2025-12-04T09:13:23.6847333Z 
2025-12-04T09:13:23.6848210Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:14.094000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15185
2025-12-04T09:13:23.6849884Z I1204 09:12:14.095000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15186
2025-12-04T09:13:23.6851083Z I1204 09:12:14.095000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15187
2025-12-04T09:13:23.6852091Z I1204 09:12:14.096000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15188
2025-12-04T09:13:23.6854194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6855993Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6858280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6860314Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6862327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6864341Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6866365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.6868372Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.6869801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.6870906Z   return func(*args, **kwargs)
2025-12-04T09:13:23.6871522Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6872595Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6874107Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6875576Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6877041Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6878406Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6879746Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6881173Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6882582Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6883997Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6885422Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6886805Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6888247Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6889665Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6891554Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 718209024 and is now 749666304.
2025-12-04T09:13:23.6893324Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6894374Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6895960Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6897566Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6898809Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6900224Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.6901445Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6902569Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6904256Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6905920Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6907572Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6909328Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6910662Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6912277Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6913707Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6915137Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6916558Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6917924Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6919367Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6920940Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6923227Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400.
2025-12-04T09:13:23.6925223Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6926390Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6928172Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6929655Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6930887Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6932409Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.6933731Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6934745Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6936237Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6938058Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6939694Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6941236Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6942754Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6944355Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6945953Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6947533Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6949316Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6950766Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6952163Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6953593Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6955474Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 611254272 and is now 640614400.
2025-12-04T09:13:23.6957261Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6958309Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6959901Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6961220Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6962299Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6963620Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.6964644Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.6965657Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.6967140Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.6968613Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.6970084Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.6971454Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.6972806Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6974216Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6975646Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.6977351Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.6979022Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.6980590Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.6982144Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.6983746Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.6985882Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.6987872Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6989048Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.6990730Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.6992040Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.6993193Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.6994445Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.6995141Z dist init r=2, world=4
2025-12-04T09:13:23.6995400Z dist init r=0, world=4
2025-12-04T09:13:23.6995652Z dist init r=3, world=4
2025-12-04T09:13:23.6995888Z dist init r=1, world=4
2025-12-04T09:13:23.6997083Z [rank0]:[W1204 09:12:21.986184200 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.6998329Z FAILED [9.2338s] [100%]
2025-12-04T09:13:23.6998489Z 
2025-12-04T09:13:23.6998635Z =================================== FAILURES ===================================
2025-12-04T09:13:23.6999113Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________
2025-12-04T09:13:23.6999576Z Traceback (most recent call last):
2025-12-04T09:13:23.7000296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.7001014Z     self._join_processes(fn)
2025-12-04T09:13:23.7001720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.7002497Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.7003289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.7004065Z     raise RuntimeError(error)
2025-12-04T09:13:23.7004462Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:13:23.7004903Z Traceback (most recent call last):
2025-12-04T09:13:23.7005602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7006364Z     getattr(self, test_name)()
2025-12-04T09:13:23.7007041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7007731Z     fn()
2025-12-04T09:13:23.7008316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7008983Z     method(*args, **kwargs)
2025-12-04T09:13:23.7009626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7010299Z     method(*args, **kwargs)
2025-12-04T09:13:23.7010921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7011597Z     with policy():
2025-12-04T09:13:23.7012206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7012897Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7013991Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400.
2025-12-04T09:13:23.7015040Z 
2025-12-04T09:13:23.7015232Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7016014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7016682Z 
2025-12-04T09:13:23.7017119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7017600Z 
2025-12-04T09:13:23.7017604Z 
2025-12-04T09:13:23.7017833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.7018462Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.7019724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml -
2025-12-04T09:13:23.7021060Z =========================== short test summary info ============================
2025-12-04T09:13:23.7022083Z FAILED [9.2338s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:13:23.7023050Z Traceback (most recent call last):
2025-12-04T09:13:23.7023854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7024668Z     getattr(self, test_name)()
2025-12-04T09:13:23.7025414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7026195Z     fn()
2025-12-04T09:13:23.7026850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7027598Z     method(*args, **kwargs)
2025-12-04T09:13:23.7028313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7029078Z     method(*args, **kwargs)
2025-12-04T09:13:23.7029787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7030537Z     with policy():
2025-12-04T09:13:23.7031222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7031994Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7033401Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400.
2025-12-04T09:13:23.7034442Z 
2025-12-04T09:13:23.7034635Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7035416Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7036013Z 
2025-12-04T09:13:23.7036255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7036789Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.7037231Z ======================= 1 failed, 1 deselected in 9.45s ========================
2025-12-04T09:13:23.7037615Z Got exit code 1
2025-12-04T09:13:23.7037863Z Retrying single test...
2025-12-04T09:13:23.7038623Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml
2025-12-04T09:13:23.7039493Z ============================= test session starts ==============================
2025-12-04T09:13:23.7040084Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.7040624Z cachedir: .pytest_cache
2025-12-04T09:13:23.7041240Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.7041938Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.7042251Z configfile: pytest.ini
2025-12-04T09:13:23.7042887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.7043771Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T09:13:23.7044624Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda
2025-12-04T09:13:23.7045391Z Running 1 items in this shard
2025-12-04T09:13:23.7045579Z 
2025-12-04T09:13:23.7046385Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:28.004000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15522
2025-12-04T09:13:23.7047751Z I1204 09:12:28.005000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15523
2025-12-04T09:13:23.7048777Z I1204 09:12:28.006000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15524
2025-12-04T09:13:23.7049786Z I1204 09:12:28.006000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15525
2025-12-04T09:13:23.7051899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.7053684Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.7055481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.7057576Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.7059679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.7061705Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.7063718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:13:23.7065720Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:13:23.7067035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.7068293Z   return func(*args, **kwargs)
2025-12-04T09:13:23.7069085Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7070207Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7071707Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7073179Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7074702Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7076075Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7077409Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7078823Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7080242Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7081663Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7083081Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7084456Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7085841Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7087272Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7089224Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304.
2025-12-04T09:13:23.7091000Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7092040Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7093614Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7094936Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7096034Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7097568Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.7098716Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7099851Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7101533Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7103304Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7104949Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7106486Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7108004Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7109666Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7111084Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7112501Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7113925Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7115306Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7116874Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7118395Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7120451Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.7122850Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7124036Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7125824Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7127324Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7128543Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7129954Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.7131101Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7132239Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7134109Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7135691Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7137458Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7139004Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7140520Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7142108Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7143723Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7145327Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7146933Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7148499Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7150049Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7151475Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7153446Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.7155217Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7156266Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7157834Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7159153Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7160248Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7161513Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.7162524Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7163534Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7165081Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7166555Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7168020Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7169372Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7170716Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7172140Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7173562Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7174966Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7176390Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7178131Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7179703Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7181396Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7183521Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 640614400.
2025-12-04T09:13:23.7185519Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7186697Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7188501Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7190090Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7191176Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7192433Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.7193148Z dist init r=3, world=4
2025-12-04T09:13:23.7193412Z dist init r=1, world=4
2025-12-04T09:13:23.7193712Z dist init r=2, world=4
2025-12-04T09:13:23.7206928Z dist init r=0, world=4
2025-12-04T09:13:23.7208263Z [rank0]:[W1204 09:12:35.871466517 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.7209605Z FAILED [8.5903s] [100%]
2025-12-04T09:13:23.7209794Z 
2025-12-04T09:13:23.7209941Z =================================== FAILURES ===================================
2025-12-04T09:13:23.7210469Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________
2025-12-04T09:13:23.7210947Z Traceback (most recent call last):
2025-12-04T09:13:23.7211708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.7212475Z     self._join_processes(fn)
2025-12-04T09:13:23.7213235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.7214065Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.7214907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.7215735Z     raise RuntimeError(error)
2025-12-04T09:13:23.7216152Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7216732Z Traceback (most recent call last):
2025-12-04T09:13:23.7217689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7218496Z     getattr(self, test_name)()
2025-12-04T09:13:23.7219249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7220039Z     fn()
2025-12-04T09:13:23.7220705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7221670Z     method(*args, **kwargs)
2025-12-04T09:13:23.7222567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7223333Z     method(*args, **kwargs)
2025-12-04T09:13:23.7224047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7224791Z     with policy():
2025-12-04T09:13:23.7225477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7226252Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7227488Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.7228679Z 
2025-12-04T09:13:23.7228895Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7229787Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7230447Z 
2025-12-04T09:13:23.7230729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7231135Z 
2025-12-04T09:13:23.7231140Z 
2025-12-04T09:13:23.7231380Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.7231998Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.7233447Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml -
2025-12-04T09:13:23.7234487Z =========================== short test summary info ============================
2025-12-04T09:13:23.7235496Z FAILED [8.5903s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7236336Z Traceback (most recent call last):
2025-12-04T09:13:23.7237053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7237776Z     getattr(self, test_name)()
2025-12-04T09:13:23.7238441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7239139Z     fn()
2025-12-04T09:13:23.7239723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7240404Z     method(*args, **kwargs)
2025-12-04T09:13:23.7241033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7241712Z     method(*args, **kwargs)
2025-12-04T09:13:23.7242347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7243025Z     with policy():
2025-12-04T09:13:23.7243624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7244316Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7245433Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400.
2025-12-04T09:13:23.7246471Z 
2025-12-04T09:13:23.7246678Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7247450Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda
2025-12-04T09:13:23.7248052Z 
2025-12-04T09:13:23.7248291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7248827Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.7249327Z ======================= 1 failed, 1 deselected in 8.80s ========================
2025-12-04T09:13:23.7249714Z Got exit code 1
2025-12-04T09:13:23.7250266Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda
2025-12-04T09:13:23.7251154Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:13:23.7252235Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml
2025-12-04T09:13:23.7253109Z ============================= test session starts ==============================
2025-12-04T09:13:23.7253711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.7254429Z cachedir: .pytest_cache
2025-12-04T09:13:23.7255094Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.7256020Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.7256369Z configfile: pytest.ini
2025-12-04T09:13:23.7257326Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.7258221Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T09:13:23.7258712Z stepcurrent: skipping 1 already run items.
2025-12-04T09:13:23.7259095Z Running 1 items in this shard
2025-12-04T09:13:23.7259306Z 
2025-12-04T09:13:23.7260252Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:12:41.514000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15859
2025-12-04T09:13:23.7261911Z I1204 09:12:41.515000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15860
2025-12-04T09:13:23.7263040Z I1204 09:12:41.516000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15861
2025-12-04T09:13:23.7264167Z I1204 09:12:41.516000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15862
2025-12-04T09:13:23.7265813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.7267069Z   return func(*args, **kwargs)
2025-12-04T09:13:23.7267736Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7269162Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7270663Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7272128Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7273593Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7274955Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7276305Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7277784Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7279209Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7280638Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7282048Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7283428Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7284987Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7286498Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7288541Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:13:23.7290500Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7291608Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7293328Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7294754Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7295913Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7297495Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.7298660Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7299807Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7301495Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7303138Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7304793Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7306337Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7307919Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7309675Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7311083Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7312502Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7313923Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7315317Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7316706Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7318121Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7320051Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:13:23.7322492Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7323676Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7325489Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7326996Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7328226Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7329644Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.7330791Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7331913Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7333691Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7335294Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7336980Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7338527Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7340155Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7341766Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7343371Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7344971Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7346581Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7348133Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7349835Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7351264Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7353184Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:13:23.7355059Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7356091Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7357702Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7359044Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7360150Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7361398Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.7362430Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7363437Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7364931Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7366405Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7367927Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7369294Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7370636Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7372050Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7373470Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7374885Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7376320Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7378080Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7379650Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7381247Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7383485Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7385517Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7386692Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7388476Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7389984Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7391067Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7392301Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.7392985Z dist init r=2, world=4
2025-12-04T09:13:23.7393229Z dist init r=0, world=4
2025-12-04T09:13:23.7393468Z dist init r=1, world=4
2025-12-04T09:13:23.7393708Z dist init r=3, world=4
2025-12-04T09:13:23.7394882Z [rank0]:[W1204 09:12:48.383065767 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.7396115Z FAILED [9.2059s] [100%]
2025-12-04T09:13:23.7396266Z 
2025-12-04T09:13:23.7396404Z =================================== FAILURES ===================================
2025-12-04T09:13:23.7396884Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________
2025-12-04T09:13:23.7397396Z Traceback (most recent call last):
2025-12-04T09:13:23.7398090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.7398792Z     self._join_processes(fn)
2025-12-04T09:13:23.7399484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.7400252Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.7401029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.7401801Z     raise RuntimeError(error)
2025-12-04T09:13:23.7402182Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7402613Z Traceback (most recent call last):
2025-12-04T09:13:23.7403306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7404000Z     getattr(self, test_name)()
2025-12-04T09:13:23.7404660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7405335Z     fn()
2025-12-04T09:13:23.7405902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7406558Z     method(*args, **kwargs)
2025-12-04T09:13:23.7407177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7407917Z     method(*args, **kwargs)
2025-12-04T09:13:23.7408529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7409184Z     with policy():
2025-12-04T09:13:23.7409783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7410456Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7411572Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:13:23.7412641Z 
2025-12-04T09:13:23.7412828Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7413626Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7414235Z 
2025-12-04T09:13:23.7414476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7414829Z 
2025-12-04T09:13:23.7414833Z 
2025-12-04T09:13:23.7415026Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.7415569Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.7416736Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml -
2025-12-04T09:13:23.7418034Z =========================== short test summary info ============================
2025-12-04T09:13:23.7419065Z FAILED [9.2059s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7420038Z Traceback (most recent call last):
2025-12-04T09:13:23.7420987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7421779Z     getattr(self, test_name)()
2025-12-04T09:13:23.7422626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7423393Z     fn()
2025-12-04T09:13:23.7424038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7424796Z     method(*args, **kwargs)
2025-12-04T09:13:23.7425496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7426250Z     method(*args, **kwargs)
2025-12-04T09:13:23.7426955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7427690Z     with policy():
2025-12-04T09:13:23.7428367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7429117Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7430399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:13:23.7431609Z 
2025-12-04T09:13:23.7431820Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7432821Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7433549Z 
2025-12-04T09:13:23.7433781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7434298Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.7435835Z ======================= 1 failed, 1 deselected in 9.41s ========================
2025-12-04T09:13:23.7436202Z Got exit code 1
2025-12-04T09:13:23.7436431Z Retrying single test...
2025-12-04T09:13:23.7437179Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml
2025-12-04T09:13:23.7438028Z ============================= test session starts ==============================
2025-12-04T09:13:23.7438602Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.7439125Z cachedir: .pytest_cache
2025-12-04T09:13:23.7439732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.7440415Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.7440724Z configfile: pytest.ini
2025-12-04T09:13:23.7441348Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.7442125Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T09:13:23.7443070Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda
2025-12-04T09:13:23.7443852Z Running 1 items in this shard
2025-12-04T09:13:23.7444039Z 
2025-12-04T09:13:23.7445325Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:12:55.504000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 16196
2025-12-04T09:13:23.7446799Z I1204 09:12:55.505000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 16197
2025-12-04T09:13:23.7447853Z I1204 09:12:55.506000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 16198
2025-12-04T09:13:23.7448931Z I1204 09:12:55.506000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 16199
2025-12-04T09:13:23.7450540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.7451701Z   return func(*args, **kwargs)
2025-12-04T09:13:23.7452325Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7453381Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7454959Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7456785Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7458589Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7460117Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7461618Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7463208Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7464784Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7466433Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7468018Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7469666Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7471245Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7472735Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7474771Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 718209024 and is now 739180544.
2025-12-04T09:13:23.7476664Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7477759Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7479454Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7480861Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7482078Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7483390Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.7484450Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7485498Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7487071Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7488844Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7490320Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7491691Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7493025Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7494451Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7495950Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7497690Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7499296Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7500847Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7502411Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7504025Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7506199Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7508214Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7509552Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7511170Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7512562Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7513657Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7514897Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.7515916Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7516921Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7518422Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7519884Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7521685Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7523221Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7524738Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7526457Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7528058Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7529646Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7531239Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7532796Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7534380Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7535804Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7538084Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:13:23.7540119Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7541306Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7543269Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7544798Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7546026Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7547443Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.7548599Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7549788Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7551371Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7552931Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7554676Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7556161Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7557683Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7559222Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7560844Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7562349Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7563850Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7565402Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7566778Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7568208Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7570127Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:13:23.7571923Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7573018Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7574613Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7576147Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7577542Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7578955Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.7579741Z dist init r=3, world=4
2025-12-04T09:13:23.7580031Z dist init r=1, world=4
2025-12-04T09:13:23.7580311Z dist init r=0, world=4
2025-12-04T09:13:23.7580578Z dist init r=2, world=4
2025-12-04T09:13:23.7581918Z [rank0]:[W1204 09:13:02.390444974 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.7583312Z FAILED [9.4457s] [100%]
2025-12-04T09:13:23.7583493Z 
2025-12-04T09:13:23.7583656Z =================================== FAILURES ===================================
2025-12-04T09:13:23.7584208Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________
2025-12-04T09:13:23.7584743Z Traceback (most recent call last):
2025-12-04T09:13:23.7585537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.7586441Z     self._join_processes(fn)
2025-12-04T09:13:23.7587233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.7588113Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.7589111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.7590128Z     raise RuntimeError(error)
2025-12-04T09:13:23.7590532Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7590976Z Traceback (most recent call last):
2025-12-04T09:13:23.7591678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7592377Z     getattr(self, test_name)()
2025-12-04T09:13:23.7593055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7593743Z     fn()
2025-12-04T09:13:23.7594307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7594977Z     method(*args, **kwargs)
2025-12-04T09:13:23.7595600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7596270Z     method(*args, **kwargs)
2025-12-04T09:13:23.7596888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7597544Z     with policy():
2025-12-04T09:13:23.7598146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7598825Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7599949Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7601016Z 
2025-12-04T09:13:23.7601259Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7602063Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7602673Z 
2025-12-04T09:13:23.7602915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7603270Z 
2025-12-04T09:13:23.7603412Z Process 3 exited with error code 10 and exception:
2025-12-04T09:13:23.7603774Z Traceback (most recent call last):
2025-12-04T09:13:23.7604470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7605178Z     getattr(self, test_name)()
2025-12-04T09:13:23.7605830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7606504Z     fn()
2025-12-04T09:13:23.7607074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7607729Z     method(*args, **kwargs)
2025-12-04T09:13:23.7608353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7609021Z     method(*args, **kwargs)
2025-12-04T09:13:23.7609640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7610288Z     with policy():
2025-12-04T09:13:23.7610888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7611614Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7612737Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:13:23.7613802Z 
2025-12-04T09:13:23.7613988Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7614785Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7615395Z 
2025-12-04T09:13:23.7615642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7615991Z 
2025-12-04T09:13:23.7615995Z 
2025-12-04T09:13:23.7616195Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.7616816Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.7618235Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml -
2025-12-04T09:13:23.7619396Z =========================== short test summary info ============================
2025-12-04T09:13:23.7620484Z FAILED [9.4457s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:13:23.7621634Z Traceback (most recent call last):
2025-12-04T09:13:23.7622418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7623214Z     getattr(self, test_name)()
2025-12-04T09:13:23.7623950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7624717Z     fn()
2025-12-04T09:13:23.7625354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7626101Z     method(*args, **kwargs)
2025-12-04T09:13:23.7626907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7627655Z     method(*args, **kwargs)
2025-12-04T09:13:23.7628363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7629100Z     with policy():
2025-12-04T09:13:23.7629778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7630538Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7631809Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7633111Z 
2025-12-04T09:13:23.7633320Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7634163Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7634819Z 
2025-12-04T09:13:23.7635067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7635441Z 
2025-12-04T09:13:23.7635599Z Process 3 exited with error code 10 and exception:
2025-12-04T09:13:23.7635976Z Traceback (most recent call last):
2025-12-04T09:13:23.7636710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7637533Z     getattr(self, test_name)()
2025-12-04T09:13:23.7638196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7638934Z     fn()
2025-12-04T09:13:23.7639507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7640180Z     method(*args, **kwargs)
2025-12-04T09:13:23.7640811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7641471Z     method(*args, **kwargs)
2025-12-04T09:13:23.7642091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7642754Z     with policy():
2025-12-04T09:13:23.7643344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7644017Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7645145Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:13:23.7646204Z 
2025-12-04T09:13:23.7646411Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7647204Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7647820Z 
2025-12-04T09:13:23.7648053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7648570Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.7649012Z ======================= 1 failed, 1 deselected in 9.66s ========================
2025-12-04T09:13:23.7649370Z Got exit code 1
2025-12-04T09:13:23.7649601Z Retrying single test...
2025-12-04T09:13:23.7650356Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml
2025-12-04T09:13:23.7651211Z ============================= test session starts ==============================
2025-12-04T09:13:23.7651835Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.7652358Z cachedir: .pytest_cache
2025-12-04T09:13:23.7652978Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.7653656Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.7653963Z configfile: pytest.ini
2025-12-04T09:13:23.7654599Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.7655370Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T09:13:23.7656228Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda
2025-12-04T09:13:23.7657281Z Running 1 items in this shard
2025-12-04T09:13:23.7657487Z 
2025-12-04T09:13:23.7658429Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:13:09.424000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 16533
2025-12-04T09:13:23.7659984Z I1204 09:13:09.425000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 16534
2025-12-04T09:13:23.7661107Z I1204 09:13:09.426000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 16535
2025-12-04T09:13:23.7662230Z I1204 09:13:09.427000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 16536
2025-12-04T09:13:23.7663874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:13:23.7665181Z   return func(*args, **kwargs)
2025-12-04T09:13:23.7665840Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7666967Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7668642Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7670358Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7671809Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7673167Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7674516Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7675927Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7677343Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7678738Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7680196Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7681573Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7682952Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7684373Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7686274Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 716111872 and is now 739180544.
2025-12-04T09:13:23.7688079Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7689115Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7690721Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7692052Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7693202Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7694452Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:13:23.7695461Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7696521Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7698319Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7699976Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7701618Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7703142Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7704649Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7706228Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7707820Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7709581Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7711034Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7712417Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7713785Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7715199Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7717118Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7718902Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7719935Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7721886Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7723487Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7724713Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7726115Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:13:23.7727251Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7728371Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7730047Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7731697Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7733418Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7734765Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7736104Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7737842Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7739434Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7741103Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7742683Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7744478Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7746031Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7747638Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7749834Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:13:23.7751616Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7752854Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7754746Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7756268Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7757456Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7758799Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:13:23.7759895Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:13:23.7760976Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:13:23.7762616Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7764199Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:13:23.7765782Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7767263Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:13:23.7768800Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7770331Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7772000Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7773499Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:13:23.7774986Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7776508Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:13:23.7778222Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7779824Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:13:23.7781985Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:13:23.7784297Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7785464Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7787356Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7788948Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:13:23.7790092Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7791403Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:13:23.7792146Z dist init r=3, world=4
2025-12-04T09:13:23.7792397Z dist init r=2, world=4
2025-12-04T09:13:23.7792652Z dist init r=0, world=4
2025-12-04T09:13:23.7792908Z dist init r=1, world=4
2025-12-04T09:13:23.7794146Z [rank0]:[W1204 09:13:16.403617794 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:13:23.7795461Z FAILED [9.2136s] [100%]
2025-12-04T09:13:23.7795633Z 
2025-12-04T09:13:23.7795769Z =================================== FAILURES ===================================
2025-12-04T09:13:23.7796383Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________
2025-12-04T09:13:23.7796836Z Traceback (most recent call last):
2025-12-04T09:13:23.7797544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:13:23.7798260Z     self._join_processes(fn)
2025-12-04T09:13:23.7798981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:13:23.7799753Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:13:23.7800534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:13:23.7801388Z     raise RuntimeError(error)
2025-12-04T09:13:23.7801773Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:13:23.7802199Z Traceback (most recent call last):
2025-12-04T09:13:23.7802883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7803584Z     getattr(self, test_name)()
2025-12-04T09:13:23.7804241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7804920Z     fn()
2025-12-04T09:13:23.7805484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7806146Z     method(*args, **kwargs)
2025-12-04T09:13:23.7806771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7807437Z     method(*args, **kwargs)
2025-12-04T09:13:23.7808062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7808714Z     with policy():
2025-12-04T09:13:23.7809317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7809984Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7811104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7812217Z 
2025-12-04T09:13:23.7812403Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7813202Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7813830Z 
2025-12-04T09:13:23.7814064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7814420Z 
2025-12-04T09:13:23.7814424Z 
2025-12-04T09:13:23.7814631Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:13:23.7815171Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:13:23.7816280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml -
2025-12-04T09:13:23.7817620Z =========================== short test summary info ============================
2025-12-04T09:13:23.7818664Z FAILED [9.2136s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:13:23.7819632Z Traceback (most recent call last):
2025-12-04T09:13:23.7820416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:13:23.7821402Z     getattr(self, test_name)()
2025-12-04T09:13:23.7822149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:13:23.7822901Z     fn()
2025-12-04T09:13:23.7823532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7824271Z     method(*args, **kwargs)
2025-12-04T09:13:23.7824966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:13:23.7825721Z     method(*args, **kwargs)
2025-12-04T09:13:23.7826428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:13:23.7827172Z     with policy():
2025-12-04T09:13:23.7827965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:13:23.7828727Z     raise RuntimeError(msg)
2025-12-04T09:13:23.7830000Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640.
2025-12-04T09:13:23.7831198Z 
2025-12-04T09:13:23.7831423Z To execute this test, run the following from the base repo dir:
2025-12-04T09:13:23.7832321Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda
2025-12-04T09:13:23.7833116Z 
2025-12-04T09:13:23.7833365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:13:23.7833918Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:13:23.7834389Z ======================= 1 failed, 1 deselected in 9.43s ========================
2025-12-04T09:13:23.7834772Z Got exit code 1
2025-12-04T09:13:23.7835371Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda
2025-12-04T09:13:23.7836334Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:13:23.7837466Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml
2025-12-04T09:13:23.7838561Z ============================= test session starts ==============================
2025-12-04T09:13:23.7838986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:13:23.7839091Z cachedir: .pytest_cache
2025-12-04T09:13:23.7839598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:13:23.7839731Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:13:23.7839834Z configfile: pytest.ini
2025-12-04T09:13:23.7840358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:13:23.7840571Z collecting ... collected 2 items / 2 deselected / 0 selected
2025-12-04T09:13:23.7840709Z stepcurrent: skipping 2 already run items.
2025-12-04T09:13:23.7840831Z Running 0 items in this shard
2025-12-04T09:13:23.7840837Z 
2025-12-04T09:13:23.7841756Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml -
2025-12-04T09:13:23.7841920Z ============================ 2 deselected in 0.01s =============================
2025-12-04T09:13:23.7842887Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda', 'test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda']
2025-12-04T09:13:23.7842892Z 
2025-12-04T09:13:23.7843503Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log)
2025-12-04T09:13:23.7843508Z 
2025-12-04T09:13:23.7843891Z Finished distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 09:13:23.663258][1235.271173761], took 1.44min
2025-12-04T09:13:23.7844740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml
2025-12-04T09:13:23.7845615Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml
2025-12-04T09:13:23.7846509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml
2025-12-04T09:13:23.8095848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml
2025-12-04T09:13:23.8427230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml
2025-12-04T09:13:23.8751958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml
2025-12-04T09:13:23.9072346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml
2025-12-04T09:13:24.1178986Z Uploading logs for 57116084904 to S3
2025-12-04T09:13:24.1539342Z Uploading artifacts took 0.22 seconds
2025-12-04T09:13:24.1539803Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed!
2025-12-04T09:13:24.1543577Z Running distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 09:13:24.154185][1235.762101353]
2025-12-04T09:13:24.1544206Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:13:24.1546832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/debug/test_debug_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:13:24.154508]
2025-12-04T09:14:13.7418881Z 
2025-12-04T09:14:13.7420103Z distributed/tensor/debug/test_debug_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_debug_mode_1.1_8a4ec9b51bad1d98_.log
2025-12-04T09:14:13.7435333Z Running 25 items in this shard: test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_structure_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_triton_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_compile, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_backward, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_densor_redistribution_trace, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_einsum, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_higher_order_cond, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_mm, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_string_inside_context, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_fake_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nn_module, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_pretty_print_dtensor_make_fx, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_real_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_attributes, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_hash_redistribute, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_triton_kernel_logs, test/distributed/tensor/debug/test_debug_mode.py::TestDebugModeUtils::test_hash_empty_tenor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base_async_op, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_functional_with_async_collective_tensor
2025-12-04T09:14:13.7449218Z 
2025-12-04T09:14:13.7449651Z Finished distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 09:14:13.741471][1285.349382303], took 0.83min
2025-12-04T09:14:13.7451101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.xml
2025-12-04T09:14:13.8339790Z Running distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 09:14:13.833741][1285.441658776]
2025-12-04T09:14:13.8340444Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:14:13.8342867Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_exec_order.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:14:13.834099]
2025-12-04T09:19:34.5390824Z 
2025-12-04T09:19:34.5391845Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log)
2025-12-04T09:19:34.5394296Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml
2025-12-04T09:19:34.5395499Z ============================= test session starts ==============================
2025-12-04T09:19:34.5396181Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.5396765Z cachedir: .pytest_cache
2025-12-04T09:19:34.5397459Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.5398234Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.5398761Z configfile: pytest.ini
2025-12-04T09:19:34.5399493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.5400298Z collecting ... collected 8 items
2025-12-04T09:19:34.5400727Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:19:34.5406720Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.5412471Z 
2025-12-04T09:19:34.5415236Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:17.224000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 19489
2025-12-04T09:19:34.5417173Z I1204 09:14:17.224000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 19490
2025-12-04T09:19:34.5418326Z I1204 09:14:17.225000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 19491
2025-12-04T09:19:34.5419468Z I1204 09:14:17.226000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 19492
2025-12-04T09:19:34.5422071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5424126Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5426159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5428192Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5448464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5450858Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5452899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5454939Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5455719Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5456992Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5458686Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5460330Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5461980Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5463524Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5465044Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5466773Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5468479Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5470035Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5471588Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5473106Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5474632Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5476185Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5478444Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.5480621Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5481769Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5483705Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5485321Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5486525Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5487902Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.5489017Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5490108Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5491744Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5493352Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5494958Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5496544Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5498296Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5499904Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5501510Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5503118Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5504725Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5506288Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5507860Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5509537Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5511798Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5513982Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5515115Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5517047Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5518687Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5519891Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5521649Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.5522792Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5523930Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5525626Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5527284Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5528929Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5530567Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5532091Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5533871Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5535381Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5537138Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5538750Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5540305Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5541868Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5543481Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5545903Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 491716608 and is now 619642880.
2025-12-04T09:19:34.5548096Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5549439Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5551206Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5552715Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5553809Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5555063Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.5556090Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5557104Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5558824Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5560383Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5562034Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5563496Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5564930Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5566426Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5567955Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5569639Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5571192Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5572692Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5574203Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5575818Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5578365Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.5580553Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5581727Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5583709Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5585408Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5586638Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5588052Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.5589047Z dist init r=3, world=4
2025-12-04T09:19:34.5589309Z dist init r=0, world=4
2025-12-04T09:19:34.5589566Z dist init r=2, world=4
2025-12-04T09:19:34.5589810Z dist init r=1, world=4
2025-12-04T09:19:34.5590069Z FAILED [8.3750s] [ 12%]
2025-12-04T09:19:34.5590241Z 
2025-12-04T09:19:34.5590380Z =================================== FAILURES ===================================
2025-12-04T09:19:34.5590933Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T09:19:34.5591499Z Traceback (most recent call last):
2025-12-04T09:19:34.5592207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.5592924Z     self._join_processes(fn)
2025-12-04T09:19:34.5593632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.5594418Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.5595210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.5595994Z     raise RuntimeError(error)
2025-12-04T09:19:34.5596395Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.5596843Z Traceback (most recent call last):
2025-12-04T09:19:34.5597554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5598252Z     getattr(self, test_name)()
2025-12-04T09:19:34.5598931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5599622Z     fn()
2025-12-04T09:19:34.5600202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5600870Z     method(*args, **kwargs)
2025-12-04T09:19:34.5601512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5602183Z     method(*args, **kwargs)
2025-12-04T09:19:34.5602874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5603528Z     with policy():
2025-12-04T09:19:34.5604144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5604837Z     raise RuntimeError(msg)
2025-12-04T09:19:34.5606110Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5607333Z 
2025-12-04T09:19:34.5607530Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5608494Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5609270Z 
2025-12-04T09:19:34.5609525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5610058Z 
2025-12-04T09:19:34.5610062Z 
2025-12-04T09:19:34.5610298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.5610885Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.5612081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml -
2025-12-04T09:19:34.5613187Z =========================== short test summary info ============================
2025-12-04T09:19:34.5614357Z FAILED [8.3750s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.5615420Z Traceback (most recent call last):
2025-12-04T09:19:34.5616174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5617198Z     getattr(self, test_name)()
2025-12-04T09:19:34.5618031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5618796Z     fn()
2025-12-04T09:19:34.5619450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5620209Z     method(*args, **kwargs)
2025-12-04T09:19:34.5621090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5621858Z     method(*args, **kwargs)
2025-12-04T09:19:34.5622577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5623343Z     with policy():
2025-12-04T09:19:34.5624017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5624787Z     raise RuntimeError(msg)
2025-12-04T09:19:34.5626242Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5627609Z 
2025-12-04T09:19:34.5627842Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5628920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5629792Z 
2025-12-04T09:19:34.5630058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5630765Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.5631248Z ============================== 1 failed in 8.40s ===============================
2025-12-04T09:19:34.5631637Z Got exit code 1
2025-12-04T09:19:34.5631914Z Retrying single test...
2025-12-04T09:19:34.5632905Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml
2025-12-04T09:19:34.5633864Z ============================= test session starts ==============================
2025-12-04T09:19:34.5634516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.5635100Z cachedir: .pytest_cache
2025-12-04T09:19:34.5635795Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.5636542Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.5636889Z configfile: pytest.ini
2025-12-04T09:19:34.5637594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.5638441Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.5639568Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5640593Z Running 1 items in this shard
2025-12-04T09:19:34.5640798Z 
2025-12-04T09:19:34.5641881Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:30.304000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 19802
2025-12-04T09:19:34.5643559Z I1204 09:14:30.305000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 19803
2025-12-04T09:19:34.5644743Z I1204 09:14:30.305000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 19804
2025-12-04T09:19:34.5645811Z I1204 09:14:30.306000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 19805
2025-12-04T09:19:34.5648132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5650031Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5651931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5653838Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5655770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5657922Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5659926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5662012Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5662789Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5663936Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5665627Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5667267Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5669027Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5670484Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5671912Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5673424Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5674921Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5676445Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5678008Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5679512Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5680905Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5682319Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5684399Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.5686329Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5687369Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5689130Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5690664Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5691765Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5693020Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.5694040Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5695034Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5696770Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5698442Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5700107Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5701645Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5703149Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5704748Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5706358Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5708014Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5709720Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5711079Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5712455Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5713866Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5715911Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5717837Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5718855Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5720651Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5722582Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5723796Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5725189Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.5726310Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5727422Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5729099Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5730746Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5732372Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5733951Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5735289Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5736940Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5738669Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5740252Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5741840Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5743388Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5744947Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5746552Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5749074Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 609157120 and is now 619642880.
2025-12-04T09:19:34.5750994Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5752089Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5753839Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5755321Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5756390Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5757623Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.5758629Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5759617Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5761098Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5762538Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5763983Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5765333Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5766723Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5768121Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5769517Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5770909Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5772310Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5773676Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5775037Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5776523Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5778988Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5781222Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5782376Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5784333Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5785996Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5787210Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5788709Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.5789525Z dist init r=0, world=4
2025-12-04T09:19:34.5789764Z dist init r=2, world=4
2025-12-04T09:19:34.5789999Z dist init r=3, world=4
2025-12-04T09:19:34.5790228Z dist init r=1, world=4
2025-12-04T09:19:34.5790450Z FAILED [8.2962s] [100%]
2025-12-04T09:19:34.5790604Z 
2025-12-04T09:19:34.5790737Z =================================== FAILURES ===================================
2025-12-04T09:19:34.5791272Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T09:19:34.5791763Z Traceback (most recent call last):
2025-12-04T09:19:34.5792452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.5793147Z     self._join_processes(fn)
2025-12-04T09:19:34.5793852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.5794603Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.5795436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.5796191Z     raise RuntimeError(error)
2025-12-04T09:19:34.5796574Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.5796992Z Traceback (most recent call last):
2025-12-04T09:19:34.5797675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5798365Z     getattr(self, test_name)()
2025-12-04T09:19:34.5799012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5799682Z     fn()
2025-12-04T09:19:34.5800255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5800910Z     method(*args, **kwargs)
2025-12-04T09:19:34.5801530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5802190Z     method(*args, **kwargs)
2025-12-04T09:19:34.5802813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5803454Z     with policy():
2025-12-04T09:19:34.5804054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5804741Z     raise RuntimeError(msg)
2025-12-04T09:19:34.5806023Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5807282Z 
2025-12-04T09:19:34.5807479Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5808448Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5809226Z 
2025-12-04T09:19:34.5809465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5809819Z 
2025-12-04T09:19:34.5809823Z 
2025-12-04T09:19:34.5810037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.5810599Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.5811717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml -
2025-12-04T09:19:34.5812765Z =========================== short test summary info ============================
2025-12-04T09:19:34.5813859Z FAILED [8.2962s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.5814878Z Traceback (most recent call last):
2025-12-04T09:19:34.5815581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5816372Z     getattr(self, test_name)()
2025-12-04T09:19:34.5817270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5818031Z     fn()
2025-12-04T09:19:34.5818682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5819449Z     method(*args, **kwargs)
2025-12-04T09:19:34.5820167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5821092Z     method(*args, **kwargs)
2025-12-04T09:19:34.5821917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5822680Z     with policy():
2025-12-04T09:19:34.5823361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5824139Z     raise RuntimeError(msg)
2025-12-04T09:19:34.5825589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.5826951Z 
2025-12-04T09:19:34.5827192Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5828269Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5829139Z 
2025-12-04T09:19:34.5829415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5830009Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.5830515Z ======================= 1 failed, 7 deselected in 8.32s ========================
2025-12-04T09:19:34.5830927Z Got exit code 1
2025-12-04T09:19:34.5831199Z Retrying single test...
2025-12-04T09:19:34.5832081Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml
2025-12-04T09:19:34.5833265Z ============================= test session starts ==============================
2025-12-04T09:19:34.5833949Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.5834512Z cachedir: .pytest_cache
2025-12-04T09:19:34.5835189Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.5835913Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.5836251Z configfile: pytest.ini
2025-12-04T09:19:34.5836940Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.5837774Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.5838848Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5839838Z Running 1 items in this shard
2025-12-04T09:19:34.5840041Z 
2025-12-04T09:19:34.5841084Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:43.364000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20115
2025-12-04T09:19:34.5842709Z I1204 09:14:43.365000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20116
2025-12-04T09:19:34.5843770Z I1204 09:14:43.365000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20117
2025-12-04T09:19:34.5844836Z I1204 09:14:43.366000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20118
2025-12-04T09:19:34.5847064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5848967Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5850936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5852826Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5854739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5856587Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5858768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.5860778Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.5861544Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5862672Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5864438Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5866108Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5867768Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5869373Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5870705Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5872129Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5873554Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5874969Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5876375Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5877758Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5879151Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5880634Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5882707Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.5884646Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5885686Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5887466Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5888969Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5890069Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5891312Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.5892335Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5893400Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5894908Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5896449Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5898244Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5899787Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5901308Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5902920Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5904529Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5906115Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5907718Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5909440Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5910884Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5912301Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5914362Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880.
2025-12-04T09:19:34.5916301Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5917351Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5919117Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5920613Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5922075Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5923585Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.5924738Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5925888Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5927567Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5929231Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5930879Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5932420Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5933995Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5935417Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5937118Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5938719Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5940328Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5941965Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5943518Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5945131Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5947456Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.5949627Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5950681Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5952429Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5953928Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5955080Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5956333Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.5957347Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.5958356Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.5959857Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5961327Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.5962802Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5964148Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.5965498Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5966922Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5968343Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.5969768Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.5971224Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.5972604Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.5973993Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.5975423Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.5977848Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 619642880.
2025-12-04T09:19:34.5980029Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5981205Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.5983196Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.5984960Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.5986196Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.5987588Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.5988383Z dist init r=2, world=4
2025-12-04T09:19:34.5988775Z dist init r=0, world=4
2025-12-04T09:19:34.5989141Z dist init r=3, world=4
2025-12-04T09:19:34.5989391Z dist init r=1, world=4
2025-12-04T09:19:34.5989639Z FAILED [8.3116s] [100%]
2025-12-04T09:19:34.5989795Z 
2025-12-04T09:19:34.5989943Z =================================== FAILURES ===================================
2025-12-04T09:19:34.5990486Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T09:19:34.5991006Z Traceback (most recent call last):
2025-12-04T09:19:34.5991718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.5992428Z     self._join_processes(fn)
2025-12-04T09:19:34.5993150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.5993936Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.5994731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.5995492Z     raise RuntimeError(error)
2025-12-04T09:19:34.5995899Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.5996341Z Traceback (most recent call last):
2025-12-04T09:19:34.5997030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.5997746Z     getattr(self, test_name)()
2025-12-04T09:19:34.5998477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.5999171Z     fn()
2025-12-04T09:19:34.5999740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6000419Z     method(*args, **kwargs)
2025-12-04T09:19:34.6001059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6001725Z     method(*args, **kwargs)
2025-12-04T09:19:34.6002370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6003046Z     with policy():
2025-12-04T09:19:34.6003667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6004342Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6005815Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.6007116Z 
2025-12-04T09:19:34.6007326Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6008536Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.6009376Z 
2025-12-04T09:19:34.6009640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6010103Z 
2025-12-04T09:19:34.6010107Z 
2025-12-04T09:19:34.6010330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6010948Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6012190Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml -
2025-12-04T09:19:34.6013333Z =========================== short test summary info ============================
2025-12-04T09:19:34.6014509Z FAILED [8.3116s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.6015629Z Traceback (most recent call last):
2025-12-04T09:19:34.6016506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6017483Z     getattr(self, test_name)()
2025-12-04T09:19:34.6018249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6019034Z     fn()
2025-12-04T09:19:34.6019696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6020454Z     method(*args, **kwargs)
2025-12-04T09:19:34.6021365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6022128Z     method(*args, **kwargs)
2025-12-04T09:19:34.6022840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6023577Z     with policy():
2025-12-04T09:19:34.6024266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6025040Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6026624Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.6028002Z 
2025-12-04T09:19:34.6028225Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6029307Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.6030167Z 
2025-12-04T09:19:34.6030450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6031045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6031542Z ======================= 1 failed, 7 deselected in 8.33s ========================
2025-12-04T09:19:34.6031970Z Got exit code 1
2025-12-04T09:19:34.6032893Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T09:19:34.6033944Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.6035041Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml
2025-12-04T09:19:34.6035925Z ============================= test session starts ==============================
2025-12-04T09:19:34.6036514Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6037040Z cachedir: .pytest_cache
2025-12-04T09:19:34.6037667Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6038433Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6038738Z configfile: pytest.ini
2025-12-04T09:19:34.6039386Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6040185Z collecting ... collected 8 items / 1 deselected / 7 selected
2025-12-04T09:19:34.6040628Z stepcurrent: skipping 1 already run items.
2025-12-04T09:19:34.6040966Z Running 7 items in this shard
2025-12-04T09:19:34.6041168Z 
2025-12-04T09:19:34.6042144Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:14:56.374000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20428
2025-12-04T09:19:34.6043680Z I1204 09:14:56.375000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20429
2025-12-04T09:19:34.6044693Z I1204 09:14:56.376000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20430
2025-12-04T09:19:34.6045692Z I1204 09:14:56.376000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20431
2025-12-04T09:19:34.6047796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6049595Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6051394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6053187Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6055031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6057080Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6059096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6061108Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6061881Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6063023Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6064703Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6066367Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6068022Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6069657Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6071003Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6072410Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6073834Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6075253Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6076676Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6078057Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6079432Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6080862Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6082990Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880.
2025-12-04T09:19:34.6084949Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6085993Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6087745Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6089238Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6090349Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6091609Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6092614Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6093627Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6095128Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6096891Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6098579Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6100107Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6101624Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6103224Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6104829Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6106433Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6108024Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6109616Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6111009Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6112445Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6114584Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.6116517Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6117566Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6119344Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6120980Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6122368Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6123767Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6124923Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6126064Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6127859Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6129513Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6131170Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6132710Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6134256Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6135681Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6137404Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6139010Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6140607Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6142164Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6143726Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6145408Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6147732Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880.
2025-12-04T09:19:34.6150007Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6151047Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6152820Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6154307Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6155402Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6156657Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6157676Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6158726Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6160229Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6161703Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6163168Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6164531Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6165873Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6167297Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6168719Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6170138Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6171555Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6172931Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6174367Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6175801Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6178263Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 498008064 and is now 619642880.
2025-12-04T09:19:34.6180464Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6181635Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6183636Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6185328Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6186562Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6187970Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6188891Z dist init r=2, world=4
2025-12-04T09:19:34.6189149Z dist init r=3, world=4
2025-12-04T09:19:34.6189403Z dist init r=0, world=4
2025-12-04T09:19:34.6189640Z dist init r=1, world=4
2025-12-04T09:19:34.6189894Z FAILED [8.2813s] [ 14%]
2025-12-04T09:19:34.6190050Z 
2025-12-04T09:19:34.6190199Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6190755Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T09:19:34.6191264Z Traceback (most recent call last):
2025-12-04T09:19:34.6191975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6192693Z     self._join_processes(fn)
2025-12-04T09:19:34.6193402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6194183Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6194977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6195755Z     raise RuntimeError(error)
2025-12-04T09:19:34.6196152Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.6196592Z Traceback (most recent call last):
2025-12-04T09:19:34.6197293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6197993Z     getattr(self, test_name)()
2025-12-04T09:19:34.6198670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6199357Z     fn()
2025-12-04T09:19:34.6199938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6200608Z     method(*args, **kwargs)
2025-12-04T09:19:34.6201248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6202656Z     method(*args, **kwargs)
2025-12-04T09:19:34.6203289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6203969Z     with policy():
2025-12-04T09:19:34.6204588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6205282Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6206560Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880.
2025-12-04T09:19:34.6207782Z 
2025-12-04T09:19:34.6207978Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6208947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6209715Z 
2025-12-04T09:19:34.6209970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6210324Z 
2025-12-04T09:19:34.6210329Z 
2025-12-04T09:19:34.6210544Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6211096Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6212220Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml -
2025-12-04T09:19:34.6213262Z =========================== short test summary info ============================
2025-12-04T09:19:34.6214404Z FAILED [8.2813s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.6215415Z Traceback (most recent call last):
2025-12-04T09:19:34.6216135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6217145Z     getattr(self, test_name)()
2025-12-04T09:19:34.6217898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6218678Z     fn()
2025-12-04T09:19:34.6219337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6220106Z     method(*args, **kwargs)
2025-12-04T09:19:34.6220981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6221925Z     method(*args, **kwargs)
2025-12-04T09:19:34.6222651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6223403Z     with policy():
2025-12-04T09:19:34.6224097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6224874Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6226323Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880.
2025-12-04T09:19:34.6227692Z 
2025-12-04T09:19:34.6227929Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6229014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6229889Z 
2025-12-04T09:19:34.6230277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6230878Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6231374Z ======================= 1 failed, 1 deselected in 8.30s ========================
2025-12-04T09:19:34.6231803Z Got exit code 1
2025-12-04T09:19:34.6232082Z Retrying single test...
2025-12-04T09:19:34.6232963Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml
2025-12-04T09:19:34.6233970Z ============================= test session starts ==============================
2025-12-04T09:19:34.6234567Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6235104Z cachedir: .pytest_cache
2025-12-04T09:19:34.6235726Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6236418Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6236736Z configfile: pytest.ini
2025-12-04T09:19:34.6237381Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6238159Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.6239185Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6240122Z Running 1 items in this shard
2025-12-04T09:19:34.6240309Z 
2025-12-04T09:19:34.6241290Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:15:09.394000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20741
2025-12-04T09:19:34.6242896Z I1204 09:15:09.395000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20742
2025-12-04T09:19:34.6243908Z I1204 09:15:09.395000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20743
2025-12-04T09:19:34.6244920Z I1204 09:15:09.396000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20744
2025-12-04T09:19:34.6247689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6249748Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6251637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6253534Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6255429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6257634Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6259745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6261764Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6262518Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6263662Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6265357Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6267043Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6268697Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6270287Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6271855Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6273333Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6274758Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6276161Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6277586Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6279140Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6280611Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6282134Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6284309Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880.
2025-12-04T09:19:34.6286365Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6287467Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6289392Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6291067Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6292153Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6293404Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6294423Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6295434Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6297224Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6298873Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6300527Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6302067Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6303578Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6305268Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6306853Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6308447Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6310016Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6311401Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6312779Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6314201Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6316265Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.6318210Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6319261Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6321397Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6323079Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6324311Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6325723Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6326878Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6328003Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6329687Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6331337Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6332987Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6334598Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6335939Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6337695Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6339305Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6340905Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6342518Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6344068Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6345636Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6347246Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6349608Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.6351628Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6352660Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6354422Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6355924Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6357023Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6358515Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6359586Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6360659Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6362243Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6363805Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6365397Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6366839Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6368263Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6369768Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6371347Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6372760Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6374183Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6375561Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6377235Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6378845Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6381233Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 495910912 and is now 619642880.
2025-12-04T09:19:34.6383423Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6384596Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6386582Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6388269Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6389628Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6390888Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6391606Z dist init r=1, world=4
2025-12-04T09:19:34.6391866Z dist init r=3, world=4
2025-12-04T09:19:34.6392106Z dist init r=0, world=4
2025-12-04T09:19:34.6392359Z dist init r=2, world=4
2025-12-04T09:19:34.6392612Z FAILED [8.2872s] [100%]
2025-12-04T09:19:34.6392764Z 
2025-12-04T09:19:34.6392901Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6393451Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T09:19:34.6394023Z Traceback (most recent call last):
2025-12-04T09:19:34.6394718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6395439Z     self._join_processes(fn)
2025-12-04T09:19:34.6396161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6396941Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6397723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6398495Z     raise RuntimeError(error)
2025-12-04T09:19:34.6398902Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6399344Z Traceback (most recent call last):
2025-12-04T09:19:34.6400030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6400752Z     getattr(self, test_name)()
2025-12-04T09:19:34.6401425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6402099Z     fn()
2025-12-04T09:19:34.6402683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6403360Z     method(*args, **kwargs)
2025-12-04T09:19:34.6403998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6404661Z     method(*args, **kwargs)
2025-12-04T09:19:34.6405297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6405967Z     with policy():
2025-12-04T09:19:34.6406565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6407255Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6408589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.6409798Z 
2025-12-04T09:19:34.6410006Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6410958Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6411741Z 
2025-12-04T09:19:34.6411981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6412349Z 
2025-12-04T09:19:34.6412353Z 
2025-12-04T09:19:34.6412555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6413122Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6414255Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml -
2025-12-04T09:19:34.6415285Z =========================== short test summary info ============================
2025-12-04T09:19:34.6416434Z FAILED [8.2872s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6417730Z Traceback (most recent call last):
2025-12-04T09:19:34.6418531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6419325Z     getattr(self, test_name)()
2025-12-04T09:19:34.6420177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6421148Z     fn()
2025-12-04T09:19:34.6421798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6422571Z     method(*args, **kwargs)
2025-12-04T09:19:34.6423294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6424056Z     method(*args, **kwargs)
2025-12-04T09:19:34.6424761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6425513Z     with policy():
2025-12-04T09:19:34.6426203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6426963Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6440805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880.
2025-12-04T09:19:34.6442128Z 
2025-12-04T09:19:34.6442340Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6443289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6444050Z 
2025-12-04T09:19:34.6444284Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6444812Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6445249Z ======================= 1 failed, 7 deselected in 8.31s ========================
2025-12-04T09:19:34.6445615Z Got exit code 1
2025-12-04T09:19:34.6445860Z Retrying single test...
2025-12-04T09:19:34.6446636Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml
2025-12-04T09:19:34.6447644Z ============================= test session starts ==============================
2025-12-04T09:19:34.6448238Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6448771Z cachedir: .pytest_cache
2025-12-04T09:19:34.6449393Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6450077Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6450383Z configfile: pytest.ini
2025-12-04T09:19:34.6451024Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6451796Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.6452826Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6453749Z Running 1 items in this shard
2025-12-04T09:19:34.6453936Z 
2025-12-04T09:19:34.6454920Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:15:22.454000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21054
2025-12-04T09:19:34.6456537Z I1204 09:15:22.455000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21055
2025-12-04T09:19:34.6457834Z I1204 09:15:22.456000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21056
2025-12-04T09:19:34.6458962Z I1204 09:15:22.456000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21057
2025-12-04T09:19:34.6461467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6463476Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6465481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6467480Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6469530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6471307Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6473082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6474856Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6475525Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6476581Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6478077Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6479545Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6480999Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6482359Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6483702Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6485110Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6486519Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6487921Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6489337Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6490769Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6492147Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6493567Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6495614Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784.
2025-12-04T09:19:34.6497926Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6499092Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6501070Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6502743Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6503955Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6505365Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6506578Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6507706Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6509421Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6510878Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6512336Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6513704Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6515218Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6516709Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6518204Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6519742Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6521565Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6523117Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6524666Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6526259Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6528601Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880.
2025-12-04T09:19:34.6530788Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6531958Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6533944Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6535440Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6536585Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6538255Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6539392Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6540507Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6542176Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6543832Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6545480Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6547010Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6548521Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6550296Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6551855Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6553358Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6554855Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6556301Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6557753Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6559451Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6561705Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880.
2025-12-04T09:19:34.6563810Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6564939Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6566847Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6568532Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6569722Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6571078Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6572181Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6573259Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6574981Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6576774Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6578418Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6579934Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6581449Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6583096Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6584685Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6586270Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6587854Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6589453Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6590916Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6592416Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6594595Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880.
2025-12-04T09:19:34.6596635Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6597778Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6599599Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6601082Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6602162Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6603395Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6604099Z dist init r=1, world=4
2025-12-04T09:19:34.6604352Z dist init r=2, world=4
2025-12-04T09:19:34.6604593Z dist init r=0, world=4
2025-12-04T09:19:34.6604824Z dist init r=3, world=4
2025-12-04T09:19:34.6605061Z FAILED [8.3149s] [100%]
2025-12-04T09:19:34.6605212Z 
2025-12-04T09:19:34.6605355Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6605890Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T09:19:34.6606399Z Traceback (most recent call last):
2025-12-04T09:19:34.6607099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6607799Z     self._join_processes(fn)
2025-12-04T09:19:34.6608506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6609281Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6610063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6610886Z     raise RuntimeError(error)
2025-12-04T09:19:34.6611280Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6611712Z Traceback (most recent call last):
2025-12-04T09:19:34.6612410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6613106Z     getattr(self, test_name)()
2025-12-04T09:19:34.6613771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6614447Z     fn()
2025-12-04T09:19:34.6615010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6615669Z     method(*args, **kwargs)
2025-12-04T09:19:34.6616563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6617469Z     method(*args, **kwargs)
2025-12-04T09:19:34.6618165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6618914Z     with policy():
2025-12-04T09:19:34.6619595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6620347Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6621959Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880.
2025-12-04T09:19:34.6623326Z 
2025-12-04T09:19:34.6623544Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6624623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6625478Z 
2025-12-04T09:19:34.6625752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6626149Z 
2025-12-04T09:19:34.6626265Z 
2025-12-04T09:19:34.6626492Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6627115Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6628387Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml -
2025-12-04T09:19:34.6629562Z =========================== short test summary info ============================
2025-12-04T09:19:34.6630764Z FAILED [8.3149s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6631904Z Traceback (most recent call last):
2025-12-04T09:19:34.6632779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6633494Z     getattr(self, test_name)()
2025-12-04T09:19:34.6634151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6634845Z     fn()
2025-12-04T09:19:34.6635422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6636086Z     method(*args, **kwargs)
2025-12-04T09:19:34.6636716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6637385Z     method(*args, **kwargs)
2025-12-04T09:19:34.6638080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6638728Z     with policy():
2025-12-04T09:19:34.6639340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6640000Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6641272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880.
2025-12-04T09:19:34.6642482Z 
2025-12-04T09:19:34.6642672Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6643630Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6644393Z 
2025-12-04T09:19:34.6644636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6645145Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6645585Z ======================= 1 failed, 7 deselected in 8.34s ========================
2025-12-04T09:19:34.6645953Z Got exit code 1
2025-12-04T09:19:34.6646663Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T09:19:34.6647715Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.6648811Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml
2025-12-04T09:19:34.6649673Z ============================= test session starts ==============================
2025-12-04T09:19:34.6650246Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6650763Z cachedir: .pytest_cache
2025-12-04T09:19:34.6651375Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6652106Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6652404Z configfile: pytest.ini
2025-12-04T09:19:34.6653044Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6653818Z collecting ... collected 8 items / 2 deselected / 6 selected
2025-12-04T09:19:34.6654229Z stepcurrent: skipping 2 already run items.
2025-12-04T09:19:34.6654567Z Running 6 items in this shard
2025-12-04T09:19:34.6654748Z 
2025-12-04T09:19:34.6655840Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:15:35.514000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21367
2025-12-04T09:19:34.6657832Z I1204 09:15:35.515000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21368
2025-12-04T09:19:34.6658958Z I1204 09:15:35.516000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21369
2025-12-04T09:19:34.6660083Z I1204 09:15:35.516000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21370
2025-12-04T09:19:34.6662444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6664517Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6666532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6668531Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6670517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6672508Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6674281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6676048Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6676717Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6677705Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6679194Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6680656Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6682164Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6683512Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6684838Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6686244Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6687655Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6688090Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6688953Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6689347Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6690208Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6690696Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6692307Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.6692639Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6693224Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6694395Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6694716Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6695362Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6695844Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6696306Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6696983Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6698046Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6698567Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6699558Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6699973Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6700942Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6701435Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6702402Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6702888Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6703852Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6704369Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6705346Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6705840Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6707647Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.6708029Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6708790Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6710079Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6710401Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6711047Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6711533Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6711939Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6712485Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6713379Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6713838Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6714716Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6715079Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6715932Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6716361Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6717227Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6717657Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6718571Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6718966Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6719831Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6720267Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6722274Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.6722650Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6723310Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6724620Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6724982Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6725716Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6726352Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6726813Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6727343Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6728341Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6728857Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6729850Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6730251Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6731209Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6731696Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6732665Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6733327Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6734314Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6734712Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6735576Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6736010Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6737997Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.6738363Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6739021Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6740323Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6740686Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6741465Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6742008Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6742113Z dist init r=0, world=4
2025-12-04T09:19:34.6742210Z dist init r=1, world=4
2025-12-04T09:19:34.6742305Z dist init r=2, world=4
2025-12-04T09:19:34.6742407Z dist init r=3, world=4
2025-12-04T09:19:34.6743571Z [rank0]:[W1204 09:15:42.210480373 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.6743671Z FAILED [8.3129s] [ 16%]
2025-12-04T09:19:34.6743677Z 
2025-12-04T09:19:34.6743834Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6744274Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.6744401Z Traceback (most recent call last):
2025-12-04T09:19:34.6744945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6745055Z     self._join_processes(fn)
2025-12-04T09:19:34.6745652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6745796Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6746486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6746615Z     raise RuntimeError(error)
2025-12-04T09:19:34.6746859Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6746993Z Traceback (most recent call last):
2025-12-04T09:19:34.6747541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6747656Z     getattr(self, test_name)()
2025-12-04T09:19:34.6748207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6748299Z     fn()
2025-12-04T09:19:34.6748814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6749039Z     method(*args, **kwargs)
2025-12-04T09:19:34.6749498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6749609Z     method(*args, **kwargs)
2025-12-04T09:19:34.6750063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6750155Z     with policy():
2025-12-04T09:19:34.6750626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6750728Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6751946Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.6751951Z 
2025-12-04T09:19:34.6752153Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6752920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6752938Z 
2025-12-04T09:19:34.6753242Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6753247Z 
2025-12-04T09:19:34.6753251Z 
2025-12-04T09:19:34.6753452Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6753703Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6754474Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml -
2025-12-04T09:19:34.6754647Z =========================== short test summary info ============================
2025-12-04T09:19:34.6755562Z FAILED [8.3129s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6755676Z Traceback (most recent call last):
2025-12-04T09:19:34.6756187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6756293Z     getattr(self, test_name)()
2025-12-04T09:19:34.6756788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6756875Z     fn()
2025-12-04T09:19:34.6757331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6757447Z     method(*args, **kwargs)
2025-12-04T09:19:34.6757900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6758048Z     method(*args, **kwargs)
2025-12-04T09:19:34.6758516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6758613Z     with policy():
2025-12-04T09:19:34.6759086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6759188Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6760392Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.6760397Z 
2025-12-04T09:19:34.6760607Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6761373Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6761378Z 
2025-12-04T09:19:34.6761636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6761799Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6761964Z ======================= 1 failed, 2 deselected in 8.33s ========================
2025-12-04T09:19:34.6762066Z Got exit code 1
2025-12-04T09:19:34.6762166Z Retrying single test...
2025-12-04T09:19:34.6762793Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml
2025-12-04T09:19:34.6762944Z ============================= test session starts ==============================
2025-12-04T09:19:34.6763258Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6763377Z cachedir: .pytest_cache
2025-12-04T09:19:34.6763840Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6764003Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6764113Z configfile: pytest.ini
2025-12-04T09:19:34.6764594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6764794Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.6765631Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6765736Z Running 1 items in this shard
2025-12-04T09:19:34.6765741Z 
2025-12-04T09:19:34.6768811Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:15:48.914000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21704
2025-12-04T09:19:34.6769279Z I1204 09:15:48.915000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21705
2025-12-04T09:19:34.6769731Z I1204 09:15:48.915000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21706
2025-12-04T09:19:34.6770168Z I1204 09:15:48.916000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21707
2025-12-04T09:19:34.6771709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6771938Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6773466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6773630Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6775149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6775318Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6777136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6777318Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6777783Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6778344Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6779358Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6779942Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6780952Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6781355Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6782337Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6782838Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6783820Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6784314Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6785276Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6785741Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6786766Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6787279Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6789161Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.6789501Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6790097Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6791268Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6791607Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6792247Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6792748Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6793158Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6793642Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6794628Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6795088Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6795983Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6796341Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6797217Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6797657Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6798528Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6798973Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6799831Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6800310Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6801174Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6801627Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6803231Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.6803579Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6804174Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6805348Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6805692Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6806332Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6806841Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6807297Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6807788Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6808678Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6809135Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6810026Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6810391Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6811260Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6811695Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6812560Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6813514Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6814375Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6814790Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6815650Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6816108Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6818101Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.6818494Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6819159Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6820472Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6821045Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6821894Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6822459Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6822914Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6823463Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6824469Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6824985Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6825995Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6826400Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6827378Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6827871Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6828922Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6829420Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6830387Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6830846Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6831814Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6832325Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6834158Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.6834520Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6835145Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6836388Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6836800Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6837480Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6838009Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6838110Z dist init r=1, world=4
2025-12-04T09:19:34.6838384Z dist init r=2, world=4
2025-12-04T09:19:34.6838481Z dist init r=3, world=4
2025-12-04T09:19:34.6838579Z dist init r=0, world=4
2025-12-04T09:19:34.6839724Z [rank0]:[W1204 09:15:56.667134069 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.6839830Z FAILED [8.8295s] [100%]
2025-12-04T09:19:34.6839836Z 
2025-12-04T09:19:34.6839983Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6840426Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.6840548Z Traceback (most recent call last):
2025-12-04T09:19:34.6841097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6841213Z     self._join_processes(fn)
2025-12-04T09:19:34.6841876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6842081Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6842659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6842791Z     raise RuntimeError(error)
2025-12-04T09:19:34.6843017Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6843136Z Traceback (most recent call last):
2025-12-04T09:19:34.6843664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6843773Z     getattr(self, test_name)()
2025-12-04T09:19:34.6844284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6844385Z     fn()
2025-12-04T09:19:34.6844868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6844986Z     method(*args, **kwargs)
2025-12-04T09:19:34.6845465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6845571Z     method(*args, **kwargs)
2025-12-04T09:19:34.6846061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6846156Z     with policy():
2025-12-04T09:19:34.6846637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6846755Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6848022Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.6848032Z 
2025-12-04T09:19:34.6848251Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6849114Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6849120Z 
2025-12-04T09:19:34.6849388Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6849393Z 
2025-12-04T09:19:34.6849550Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6849666Z Traceback (most recent call last):
2025-12-04T09:19:34.6850299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6850402Z     getattr(self, test_name)()
2025-12-04T09:19:34.6850887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6850977Z     fn()
2025-12-04T09:19:34.6851425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6851527Z     method(*args, **kwargs)
2025-12-04T09:19:34.6851980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6852076Z     method(*args, **kwargs)
2025-12-04T09:19:34.6852536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6852623Z     with policy():
2025-12-04T09:19:34.6853073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6853183Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6854376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.6854450Z 
2025-12-04T09:19:34.6854650Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6855414Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6855419Z 
2025-12-04T09:19:34.6855666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6855671Z 
2025-12-04T09:19:34.6855675Z 
2025-12-04T09:19:34.6855875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6856109Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6857169Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml -
2025-12-04T09:19:34.6857343Z =========================== short test summary info ============================
2025-12-04T09:19:34.6858374Z FAILED [8.8295s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.6858495Z Traceback (most recent call last):
2025-12-04T09:19:34.6859045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6859163Z     getattr(self, test_name)()
2025-12-04T09:19:34.6859702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6859805Z     fn()
2025-12-04T09:19:34.6860311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6860415Z     method(*args, **kwargs)
2025-12-04T09:19:34.6860987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6861091Z     method(*args, **kwargs)
2025-12-04T09:19:34.6861601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6861696Z     with policy():
2025-12-04T09:19:34.6862204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6862321Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6863670Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.6863679Z 
2025-12-04T09:19:34.6863900Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6864753Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6864758Z 
2025-12-04T09:19:34.6865022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6865027Z 
2025-12-04T09:19:34.6865197Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6865319Z Traceback (most recent call last):
2025-12-04T09:19:34.6865876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6866044Z     getattr(self, test_name)()
2025-12-04T09:19:34.6866587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6866685Z     fn()
2025-12-04T09:19:34.6867197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6867304Z     method(*args, **kwargs)
2025-12-04T09:19:34.6867819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6867922Z     method(*args, **kwargs)
2025-12-04T09:19:34.6868439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6868533Z     with policy():
2025-12-04T09:19:34.6869141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6869258Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6870530Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.6870536Z 
2025-12-04T09:19:34.6870743Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6871616Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6871621Z 
2025-12-04T09:19:34.6871855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6872021Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6872179Z ======================= 1 failed, 7 deselected in 8.85s ========================
2025-12-04T09:19:34.6872277Z Got exit code 1
2025-12-04T09:19:34.6872370Z Retrying single test...
2025-12-04T09:19:34.6873026Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml
2025-12-04T09:19:34.6873179Z ============================= test session starts ==============================
2025-12-04T09:19:34.6873489Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6873585Z cachedir: .pytest_cache
2025-12-04T09:19:34.6874049Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6874154Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6874255Z configfile: pytest.ini
2025-12-04T09:19:34.6874728Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6874914Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.6875761Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6875860Z Running 1 items in this shard
2025-12-04T09:19:34.6875864Z 
2025-12-04T09:19:34.6876945Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:16:02.204000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22041
2025-12-04T09:19:34.6877390Z I1204 09:16:02.204000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22042
2025-12-04T09:19:34.6877830Z I1204 09:16:02.205000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22043
2025-12-04T09:19:34.6878321Z I1204 09:16:02.206000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22044
2025-12-04T09:19:34.6879860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6880016Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6881532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6881687Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6883200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6883353Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6884863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6885020Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6885477Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6885952Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6886850Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6887303Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6888195Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6888553Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6889408Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6889851Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6890702Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6891199Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6892054Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6892463Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6893321Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6893756Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6895366Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.6895695Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6896353Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6897805Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6898182Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6898908Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6899550Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6900007Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6900537Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6901553Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6902070Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6903075Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6903475Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6904435Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6904935Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6905954Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6906460Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6907418Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6907881Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6908941Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6909489Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6911107Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640.
2025-12-04T09:19:34.6911432Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6912025Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6913191Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6913573Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6914212Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6914703Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.6915111Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6915583Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6916481Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6916936Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6917820Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6918176Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6919032Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6919524Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6920380Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6920955Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6922066Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6922523Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6923498Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6923992Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6925820Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.6926185Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6926855Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6928285Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6928659Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6929376Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6929933Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.6930388Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6930924Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6931931Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6932440Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6933440Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6934101Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6934963Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6935395Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6936304Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6936934Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6937892Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6938348Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6939315Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6939803Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6941610Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 598671360 and is now 630128640.
2025-12-04T09:19:34.6941976Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6942710Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6944018Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6944385Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6945104Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6945661Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.6945768Z dist init r=3, world=4
2025-12-04T09:19:34.6945870Z dist init r=2, world=4
2025-12-04T09:19:34.6945971Z dist init r=1, world=4
2025-12-04T09:19:34.6946069Z dist init r=0, world=4
2025-12-04T09:19:34.6947229Z [rank0]:[W1204 09:16:09.901193775 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.6947338Z FAILED [8.5234s] [100%]
2025-12-04T09:19:34.6947345Z 
2025-12-04T09:19:34.6947494Z =================================== FAILURES ===================================
2025-12-04T09:19:34.6948003Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.6948124Z Traceback (most recent call last):
2025-12-04T09:19:34.6948678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.6948797Z     self._join_processes(fn)
2025-12-04T09:19:34.6949431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.6949563Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.6950101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.6950202Z     raise RuntimeError(error)
2025-12-04T09:19:34.6950418Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6950524Z Traceback (most recent call last):
2025-12-04T09:19:34.6951009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6951114Z     getattr(self, test_name)()
2025-12-04T09:19:34.6951591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6951684Z     fn()
2025-12-04T09:19:34.6952134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6952227Z     method(*args, **kwargs)
2025-12-04T09:19:34.6952685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6952775Z     method(*args, **kwargs)
2025-12-04T09:19:34.6953221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6953318Z     with policy():
2025-12-04T09:19:34.6953769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6953877Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6955144Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640.
2025-12-04T09:19:34.6955151Z 
2025-12-04T09:19:34.6955345Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6956110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6956114Z 
2025-12-04T09:19:34.6956349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6956358Z 
2025-12-04T09:19:34.6956362Z 
2025-12-04T09:19:34.6956565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.6956797Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.6957565Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml -
2025-12-04T09:19:34.6957720Z =========================== short test summary info ============================
2025-12-04T09:19:34.6958616Z FAILED [8.5234s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.6958734Z Traceback (most recent call last):
2025-12-04T09:19:34.6959219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6959378Z     getattr(self, test_name)()
2025-12-04T09:19:34.6959856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6959939Z     fn()
2025-12-04T09:19:34.6960399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6960491Z     method(*args, **kwargs)
2025-12-04T09:19:34.6960939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6961045Z     method(*args, **kwargs)
2025-12-04T09:19:34.6961490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6961584Z     with policy():
2025-12-04T09:19:34.6962037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6962135Z     raise RuntimeError(msg)
2025-12-04T09:19:34.6963339Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640.
2025-12-04T09:19:34.6963345Z 
2025-12-04T09:19:34.6963538Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6964304Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6964309Z 
2025-12-04T09:19:34.6964547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6964710Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.6964876Z ======================= 1 failed, 7 deselected in 8.55s ========================
2025-12-04T09:19:34.6964963Z Got exit code 1
2025-12-04T09:19:34.6965702Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T09:19:34.6966067Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.6966674Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml
2025-12-04T09:19:34.6966829Z ============================= test session starts ==============================
2025-12-04T09:19:34.6967141Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.6967249Z cachedir: .pytest_cache
2025-12-04T09:19:34.6967703Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.6967814Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.6967920Z configfile: pytest.ini
2025-12-04T09:19:34.6968402Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.6968587Z collecting ... collected 8 items / 3 deselected / 5 selected
2025-12-04T09:19:34.6968718Z stepcurrent: skipping 3 already run items.
2025-12-04T09:19:34.6968819Z Running 5 items in this shard
2025-12-04T09:19:34.6968824Z 
2025-12-04T09:19:34.6969916Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:15.504000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22378
2025-12-04T09:19:34.6970411Z I1204 09:16:15.505000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22379
2025-12-04T09:19:34.6970866Z I1204 09:16:15.506000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22380
2025-12-04T09:19:34.6971299Z I1204 09:16:15.507000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22381
2025-12-04T09:19:34.6972825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6972982Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6974507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6974670Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6976177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6976413Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6978323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.6978495Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.6978955Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6979488Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6980493Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6981003Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6982004Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6982402Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6983363Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6983860Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6984821Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6985379Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6986346Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.6986803Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.6987770Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.6988263Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.6990044Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.6990375Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6990965Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.6992125Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.6992509Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.6993149Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.6993642Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.6994042Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.6994514Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.6995414Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.6995868Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.6996756Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.6997107Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.6997977Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6998518Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.6999378Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.6999997Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7000900Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7001333Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7002245Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7002712Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7004421Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640.
2025-12-04T09:19:34.7004769Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7005405Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7006683Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7007042Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7007717Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7008405Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7008851Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7009367Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7010357Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7010852Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7011816Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7012203Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7013201Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7013678Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7014607Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7015095Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7016021Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7016545Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7017675Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7018180Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7019989Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7020361Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7021291Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7022604Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7022979Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7023695Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7024255Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7024711Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7025245Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7026253Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7026763Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7027829Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7028231Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7029202Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7029689Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7030654Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7031158Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7032117Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7032683Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7033703Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7034180Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7035922Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7036269Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7036892Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7038117Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7038568Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7039210Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7039706Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7039800Z dist init r=1, world=4
2025-12-04T09:19:34.7039886Z dist init r=2, world=4
2025-12-04T09:19:34.7039979Z dist init r=0, world=4
2025-12-04T09:19:34.7040066Z dist init r=3, world=4
2025-12-04T09:19:34.7041092Z [rank0]:[W1204 09:16:22.343252882 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7041242Z FAILED [8.4208s] [ 20%]
2025-12-04T09:19:34.7041247Z 
2025-12-04T09:19:34.7041378Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7041780Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7041889Z Traceback (most recent call last):
2025-12-04T09:19:34.7042374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7042484Z     self._join_processes(fn)
2025-12-04T09:19:34.7043004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7043139Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7043676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7043778Z     raise RuntimeError(error)
2025-12-04T09:19:34.7043996Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7044100Z Traceback (most recent call last):
2025-12-04T09:19:34.7044583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7044696Z     getattr(self, test_name)()
2025-12-04T09:19:34.7045172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7045259Z     fn()
2025-12-04T09:19:34.7045709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7045802Z     method(*args, **kwargs)
2025-12-04T09:19:34.7046262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7046357Z     method(*args, **kwargs)
2025-12-04T09:19:34.7046805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7046904Z     with policy():
2025-12-04T09:19:34.7047404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7047507Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7048707Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7048712Z 
2025-12-04T09:19:34.7048920Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7049683Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7049692Z 
2025-12-04T09:19:34.7049928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7049933Z 
2025-12-04T09:19:34.7049941Z 
2025-12-04T09:19:34.7050151Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7050386Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7051155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml -
2025-12-04T09:19:34.7051309Z =========================== short test summary info ============================
2025-12-04T09:19:34.7052205Z FAILED [8.4208s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7052388Z Traceback (most recent call last):
2025-12-04T09:19:34.7052880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7052993Z     getattr(self, test_name)()
2025-12-04T09:19:34.7053475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7053555Z     fn()
2025-12-04T09:19:34.7054025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7054117Z     method(*args, **kwargs)
2025-12-04T09:19:34.7054575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7054671Z     method(*args, **kwargs)
2025-12-04T09:19:34.7055122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7055224Z     with policy():
2025-12-04T09:19:34.7055681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7055777Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7057275Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7057282Z 
2025-12-04T09:19:34.7057497Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7058358Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7058370Z 
2025-12-04T09:19:34.7058636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7058827Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7059071Z ======================= 1 failed, 3 deselected in 8.44s ========================
2025-12-04T09:19:34.7059171Z Got exit code 1
2025-12-04T09:19:34.7059291Z Retrying single test...
2025-12-04T09:19:34.7059976Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml
2025-12-04T09:19:34.7060139Z ============================= test session starts ==============================
2025-12-04T09:19:34.7060501Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7060607Z cachedir: .pytest_cache
2025-12-04T09:19:34.7061139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7061262Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7061370Z configfile: pytest.ini
2025-12-04T09:19:34.7061922Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7062124Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7063058Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7063186Z Running 1 items in this shard
2025-12-04T09:19:34.7063191Z 
2025-12-04T09:19:34.7064409Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:28.824000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22715
2025-12-04T09:19:34.7064980Z I1204 09:16:28.825000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22716
2025-12-04T09:19:34.7065476Z I1204 09:16:28.826000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22717
2025-12-04T09:19:34.7065980Z I1204 09:16:28.826000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22718
2025-12-04T09:19:34.7067711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7067883Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7069735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7069881Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7071415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7071566Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7073332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7073488Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7073928Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7074434Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7075385Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7075878Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7076819Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7077198Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7078112Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7078579Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7079533Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7079995Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7080898Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7081319Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7082238Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7082708Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7084419Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7084765Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7085387Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7086677Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7087114Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7087763Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7088245Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7088659Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7089134Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7090032Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7090496Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7091374Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7091737Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7092585Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7093081Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7093928Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7094360Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7095218Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7095616Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7096544Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7097194Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7099004Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640.
2025-12-04T09:19:34.7099374Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7100040Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7101443Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7101808Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7102537Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7103085Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7103551Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7104091Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7105097Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7105619Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7106608Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7107071Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7108037Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7108649Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7109641Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7110072Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7110933Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7111334Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7112199Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7112638Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7114247Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7114627Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7115223Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7116386Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7116710Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7117360Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7117847Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7118258Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7118729Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7119619Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7120079Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7121539Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7121963Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7122928Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7123431Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7124393Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7124888Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7125870Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7126315Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7127299Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7127792Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7129705Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7130074Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7130745Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7132063Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7132428Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7133270Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7133905Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7134011Z dist init r=3, world=4
2025-12-04T09:19:34.7134107Z dist init r=0, world=4
2025-12-04T09:19:34.7134200Z dist init r=2, world=4
2025-12-04T09:19:34.7134301Z dist init r=1, world=4
2025-12-04T09:19:34.7135400Z [rank0]:[W1204 09:16:35.589301494 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7135567Z FAILED [8.4881s] [100%]
2025-12-04T09:19:34.7135573Z 
2025-12-04T09:19:34.7135718Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7136133Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7136326Z Traceback (most recent call last):
2025-12-04T09:19:34.7137027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7137142Z     self._join_processes(fn)
2025-12-04T09:19:34.7137745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7137887Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7138507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7138621Z     raise RuntimeError(error)
2025-12-04T09:19:34.7138858Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7138997Z Traceback (most recent call last):
2025-12-04T09:19:34.7139544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7139659Z     getattr(self, test_name)()
2025-12-04T09:19:34.7140205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7140292Z     fn()
2025-12-04T09:19:34.7140812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7140918Z     method(*args, **kwargs)
2025-12-04T09:19:34.7141430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7141544Z     method(*args, **kwargs)
2025-12-04T09:19:34.7142113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7142222Z     with policy():
2025-12-04T09:19:34.7142731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7142840Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7144200Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7144207Z 
2025-12-04T09:19:34.7144421Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7145287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7145293Z 
2025-12-04T09:19:34.7145563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7145568Z 
2025-12-04T09:19:34.7145573Z 
2025-12-04T09:19:34.7145792Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7146065Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7146928Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml -
2025-12-04T09:19:34.7147108Z =========================== short test summary info ============================
2025-12-04T09:19:34.7148173Z FAILED [8.4881s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7148292Z Traceback (most recent call last):
2025-12-04T09:19:34.7148954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7149056Z     getattr(self, test_name)()
2025-12-04T09:19:34.7149545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7149627Z     fn()
2025-12-04T09:19:34.7150078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7150180Z     method(*args, **kwargs)
2025-12-04T09:19:34.7150630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7150737Z     method(*args, **kwargs)
2025-12-04T09:19:34.7151186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7151278Z     with policy():
2025-12-04T09:19:34.7151749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7151850Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7153046Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7153062Z 
2025-12-04T09:19:34.7153254Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7154020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7154025Z 
2025-12-04T09:19:34.7154274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7154500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7154673Z ======================= 1 failed, 7 deselected in 8.51s ========================
2025-12-04T09:19:34.7154761Z Got exit code 1
2025-12-04T09:19:34.7154859Z Retrying single test...
2025-12-04T09:19:34.7155478Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml
2025-12-04T09:19:34.7155625Z ============================= test session starts ==============================
2025-12-04T09:19:34.7155938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7156051Z cachedir: .pytest_cache
2025-12-04T09:19:34.7156512Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7156630Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7156729Z configfile: pytest.ini
2025-12-04T09:19:34.7157206Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7157400Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7158230Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7158347Z Running 1 items in this shard
2025-12-04T09:19:34.7158351Z 
2025-12-04T09:19:34.7159422Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:42.174000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23052
2025-12-04T09:19:34.7159930Z I1204 09:16:42.175000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23053
2025-12-04T09:19:34.7160379Z I1204 09:16:42.176000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23054
2025-12-04T09:19:34.7160814Z I1204 09:16:42.176000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23055
2025-12-04T09:19:34.7162354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7162508Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7164056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7164204Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7165721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7165879Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7167437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7167591Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7168003Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7168492Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7169380Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7169842Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7170721Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7171073Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7171938Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7172426Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7173292Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7173725Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7174580Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7174986Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7175844Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7176360Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7178298Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7178675Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7179338Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7180731Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7181098Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7181814Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7182371Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7182824Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7183367Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7184371Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7184888Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7185878Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7186274Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7187299Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7187796Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7188873Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7189418Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7190279Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7190680Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7191538Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7191984Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7193572Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.7193916Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7194548Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7195730Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7196055Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7196688Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7197187Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7197593Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7198075Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7198966Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7199426Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7200308Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7200727Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7201588Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7202021Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7202883Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7203322Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7204182Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7204578Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7205433Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7205880Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7207517Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7207860Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7208449Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7209619Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7209948Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7210581Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7211079Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7211477Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7211962Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7212852Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7213416Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7214305Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7214659Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7215522Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7215955Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7217102Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7217613Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7218586Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7219031Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7219997Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7220510Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7222601Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7222984Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7223651Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7224970Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7225340Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7226056Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7226612Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7226716Z dist init r=0, world=4
2025-12-04T09:19:34.7226829Z dist init r=3, world=4
2025-12-04T09:19:34.7226929Z dist init r=2, world=4
2025-12-04T09:19:34.7227024Z dist init r=1, world=4
2025-12-04T09:19:34.7228264Z [rank0]:[W1204 09:16:49.864692706 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7228371Z FAILED [9.1002s] [100%]
2025-12-04T09:19:34.7228377Z 
2025-12-04T09:19:34.7228536Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7228983Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7229099Z Traceback (most recent call last):
2025-12-04T09:19:34.7229663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7229775Z     self._join_processes(fn)
2025-12-04T09:19:34.7230370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7230529Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7231136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7231266Z     raise RuntimeError(error)
2025-12-04T09:19:34.7231509Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7231627Z Traceback (most recent call last):
2025-12-04T09:19:34.7232187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7232297Z     getattr(self, test_name)()
2025-12-04T09:19:34.7232931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7233015Z     fn()
2025-12-04T09:19:34.7233494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7233608Z     method(*args, **kwargs)
2025-12-04T09:19:34.7234079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7234178Z     method(*args, **kwargs)
2025-12-04T09:19:34.7234717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7234809Z     with policy():
2025-12-04T09:19:34.7235300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7235504Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7236700Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7236709Z 
2025-12-04T09:19:34.7236913Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7237673Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7237678Z 
2025-12-04T09:19:34.7237924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7237928Z 
2025-12-04T09:19:34.7238070Z Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7238179Z Traceback (most recent call last):
2025-12-04T09:19:34.7238681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7238782Z     getattr(self, test_name)()
2025-12-04T09:19:34.7239272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7239401Z     fn()
2025-12-04T09:19:34.7239850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7239954Z     method(*args, **kwargs)
2025-12-04T09:19:34.7240404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7240495Z     method(*args, **kwargs)
2025-12-04T09:19:34.7240950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7241035Z     with policy():
2025-12-04T09:19:34.7241498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7241592Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7242786Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7242805Z 
2025-12-04T09:19:34.7242999Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7243754Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7243759Z 
2025-12-04T09:19:34.7244002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7244007Z 
2025-12-04T09:19:34.7244149Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7244266Z Traceback (most recent call last):
2025-12-04T09:19:34.7244752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7244852Z     getattr(self, test_name)()
2025-12-04T09:19:34.7245339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7245417Z     fn()
2025-12-04T09:19:34.7245909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7246018Z     method(*args, **kwargs)
2025-12-04T09:19:34.7246469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7246570Z     method(*args, **kwargs)
2025-12-04T09:19:34.7247021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7247109Z     with policy():
2025-12-04T09:19:34.7247571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7247674Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7248874Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.7248879Z 
2025-12-04T09:19:34.7249073Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7249829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7249833Z 
2025-12-04T09:19:34.7250082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7250086Z 
2025-12-04T09:19:34.7250090Z 
2025-12-04T09:19:34.7250354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7250600Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7251375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml -
2025-12-04T09:19:34.7251529Z =========================== short test summary info ============================
2025-12-04T09:19:34.7252443Z FAILED [9.1002s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7252555Z Traceback (most recent call last):
2025-12-04T09:19:34.7253061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7253164Z     getattr(self, test_name)()
2025-12-04T09:19:34.7253644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7253734Z     fn()
2025-12-04T09:19:34.7254187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7254293Z     method(*args, **kwargs)
2025-12-04T09:19:34.7254746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7254839Z     method(*args, **kwargs)
2025-12-04T09:19:34.7255294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7255385Z     with policy():
2025-12-04T09:19:34.7255838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7255954Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7257549Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7257557Z 
2025-12-04T09:19:34.7257790Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7258647Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7258652Z 
2025-12-04T09:19:34.7258929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7258935Z 
2025-12-04T09:19:34.7259103Z Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7259224Z Traceback (most recent call last):
2025-12-04T09:19:34.7259795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7259908Z     getattr(self, test_name)()
2025-12-04T09:19:34.7260458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7260550Z     fn()
2025-12-04T09:19:34.7261059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7261176Z     method(*args, **kwargs)
2025-12-04T09:19:34.7261685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7261788Z     method(*args, **kwargs)
2025-12-04T09:19:34.7262303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7262455Z     with policy():
2025-12-04T09:19:34.7262979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7263087Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7264429Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7264435Z 
2025-12-04T09:19:34.7264661Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7265509Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7265515Z 
2025-12-04T09:19:34.7265790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7265799Z 
2025-12-04T09:19:34.7265962Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7266079Z Traceback (most recent call last):
2025-12-04T09:19:34.7266643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7266756Z     getattr(self, test_name)()
2025-12-04T09:19:34.7267305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7267396Z     fn()
2025-12-04T09:19:34.7267903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7268020Z     method(*args, **kwargs)
2025-12-04T09:19:34.7268532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7268636Z     method(*args, **kwargs)
2025-12-04T09:19:34.7269223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7269313Z     with policy():
2025-12-04T09:19:34.7269820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7269919Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7271109Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640.
2025-12-04T09:19:34.7271126Z 
2025-12-04T09:19:34.7271320Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7272071Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7272079Z 
2025-12-04T09:19:34.7272326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7272492Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7272664Z ======================= 1 failed, 7 deselected in 9.12s ========================
2025-12-04T09:19:34.7272749Z Got exit code 1
2025-12-04T09:19:34.7273437Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7273809Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.7274414Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml
2025-12-04T09:19:34.7274609Z ============================= test session starts ==============================
2025-12-04T09:19:34.7274929Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7275020Z cachedir: .pytest_cache
2025-12-04T09:19:34.7275484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7275591Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7275681Z configfile: pytest.ini
2025-12-04T09:19:34.7276166Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7276349Z collecting ... collected 8 items / 4 deselected / 4 selected
2025-12-04T09:19:34.7276474Z stepcurrent: skipping 4 already run items.
2025-12-04T09:19:34.7276577Z Running 4 items in this shard
2025-12-04T09:19:34.7276582Z 
2025-12-04T09:19:34.7277652Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:16:55.474000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23389
2025-12-04T09:19:34.7278110Z I1204 09:16:55.475000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23390
2025-12-04T09:19:34.7278545Z I1204 09:16:55.476000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23391
2025-12-04T09:19:34.7278984Z I1204 09:16:55.477000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23392
2025-12-04T09:19:34.7280515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7280667Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7282242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7282389Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7283922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7284073Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7285594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7285738Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7286154Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7286633Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7287580Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7288043Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7288928Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7289288Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7290145Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7290590Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7291442Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7291880Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7292744Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7293138Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7293997Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7294494Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7296100Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7296519Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7297340Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7298666Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7299034Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7299765Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7300313Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7300846Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7301377Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7302388Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7302911Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7303898Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7304305Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7305273Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7305772Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7306740Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7307230Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7308191Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7308745Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7309829Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7310291Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7312000Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7312349Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7312971Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7314213Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7314558Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7315235Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7315803Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7316236Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7316912Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7317881Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7318383Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7319350Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7319743Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7320678Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7321300Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7322418Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7322909Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7323972Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7324420Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7325393Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7325881Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7327703Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7328066Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7328735Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7330039Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7330471Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7331203Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7331749Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7332209Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7332843Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7333921Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7334414Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7335344Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7335729Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7336864Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7337358Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7338323Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7338864Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7339835Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7340281Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7341256Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7341754Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7343564Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640.
2025-12-04T09:19:34.7343931Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7344596Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7345955Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7346324Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7347050Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7347601Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7347710Z dist init r=1, world=4
2025-12-04T09:19:34.7347807Z dist init r=0, world=4
2025-12-04T09:19:34.7347907Z dist init r=3, world=4
2025-12-04T09:19:34.7348012Z dist init r=2, world=4
2025-12-04T09:19:34.7349361Z [rank0]:[W1204 09:17:02.233958987 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7349469Z FAILED [8.2942s] [ 25%]
2025-12-04T09:19:34.7349475Z 
2025-12-04T09:19:34.7349610Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7350025Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.7350155Z Traceback (most recent call last):
2025-12-04T09:19:34.7350672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7350779Z     self._join_processes(fn)
2025-12-04T09:19:34.7351337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7351474Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7352053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7352230Z     raise RuntimeError(error)
2025-12-04T09:19:34.7352452Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7352570Z Traceback (most recent call last):
2025-12-04T09:19:34.7353078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7353181Z     getattr(self, test_name)()
2025-12-04T09:19:34.7353692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7353773Z     fn()
2025-12-04T09:19:34.7354264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7354363Z     method(*args, **kwargs)
2025-12-04T09:19:34.7354841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7354951Z     method(*args, **kwargs)
2025-12-04T09:19:34.7355529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7355613Z     with policy():
2025-12-04T09:19:34.7356081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7356177Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7357378Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7363826Z 
2025-12-04T09:19:34.7364080Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7364859Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7364864Z 
2025-12-04T09:19:34.7365110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7365115Z 
2025-12-04T09:19:34.7365119Z 
2025-12-04T09:19:34.7365317Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7365557Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7366328Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml -
2025-12-04T09:19:34.7366485Z =========================== short test summary info ============================
2025-12-04T09:19:34.7367399Z FAILED [8.2942s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7367508Z Traceback (most recent call last):
2025-12-04T09:19:34.7368009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7368109Z     getattr(self, test_name)()
2025-12-04T09:19:34.7368588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7368673Z     fn()
2025-12-04T09:19:34.7369123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7369227Z     method(*args, **kwargs)
2025-12-04T09:19:34.7369672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7369764Z     method(*args, **kwargs)
2025-12-04T09:19:34.7370277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7370365Z     with policy():
2025-12-04T09:19:34.7370821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7370931Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7372125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7372135Z 
2025-12-04T09:19:34.7372334Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7373100Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7373105Z 
2025-12-04T09:19:34.7373350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7373508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7373664Z ======================= 1 failed, 4 deselected in 8.32s ========================
2025-12-04T09:19:34.7373760Z Got exit code 1
2025-12-04T09:19:34.7373852Z Retrying single test...
2025-12-04T09:19:34.7374458Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml
2025-12-04T09:19:34.7374604Z ============================= test session starts ==============================
2025-12-04T09:19:34.7374974Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7375076Z cachedir: .pytest_cache
2025-12-04T09:19:34.7375537Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7375646Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7375745Z configfile: pytest.ini
2025-12-04T09:19:34.7376333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7376526Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7377613Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7377730Z Running 1 items in this shard
2025-12-04T09:19:34.7377736Z 
2025-12-04T09:19:34.7378955Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:17:08.664000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23726
2025-12-04T09:19:34.7379453Z I1204 09:17:08.665000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23727
2025-12-04T09:19:34.7379956Z I1204 09:17:08.666000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23728
2025-12-04T09:19:34.7380443Z I1204 09:17:08.666000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23729
2025-12-04T09:19:34.7382167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7382408Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7384130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7384304Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7386006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7386182Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7387886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7388055Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7388622Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7389281Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7390183Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7390639Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7391526Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7391880Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7392740Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7393180Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7394028Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7394465Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7395320Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7395732Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7396642Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7397089Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7398685Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7399020Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7399600Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7400764Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7401093Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7401731Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7402217Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7402685Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7403157Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7404053Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7404506Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7405388Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7405746Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7406610Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7407040Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7407889Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7408326Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7409183Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7409639Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7410494Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7410933Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7412543Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 598671360 and is now 630128640.
2025-12-04T09:19:34.7412881Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7413464Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7414630Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7414963Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7415646Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7416144Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7416788Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7417317Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7418327Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7418838Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7419845Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7420243Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7421420Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7421914Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7422874Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7423378Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7424510Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7424967Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7425932Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7426432Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7428242Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7428615Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7429273Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7430576Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7431018Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7431738Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7432288Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7432846Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7433319Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7434212Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7434668Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7435553Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7435908Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7436770Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7437204Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7438109Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7438546Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7439396Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7439801Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7440659Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7441106Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7442698Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:19:34.7443030Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7443613Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7444825Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7445154Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7445787Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7446282Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7446370Z dist init r=3, world=4
2025-12-04T09:19:34.7446462Z dist init r=2, world=4
2025-12-04T09:19:34.7446557Z dist init r=1, world=4
2025-12-04T09:19:34.7446641Z dist init r=0, world=4
2025-12-04T09:19:34.7447681Z [rank0]:[W1204 09:17:15.411024733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7447766Z FAILED [8.6960s] [100%]
2025-12-04T09:19:34.7447772Z 
2025-12-04T09:19:34.7447904Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7448302Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.7448406Z Traceback (most recent call last):
2025-12-04T09:19:34.7448900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7448998Z     self._join_processes(fn)
2025-12-04T09:19:34.7449525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7449663Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7450246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7450354Z     raise RuntimeError(error)
2025-12-04T09:19:34.7450568Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7450674Z Traceback (most recent call last):
2025-12-04T09:19:34.7451158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7451255Z     getattr(self, test_name)()
2025-12-04T09:19:34.7451729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7451823Z     fn()
2025-12-04T09:19:34.7452270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7452363Z     method(*args, **kwargs)
2025-12-04T09:19:34.7452818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7452911Z     method(*args, **kwargs)
2025-12-04T09:19:34.7453361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7453447Z     with policy():
2025-12-04T09:19:34.7453896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7453997Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7455191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:19:34.7455258Z 
2025-12-04T09:19:34.7455456Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7456294Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7456300Z 
2025-12-04T09:19:34.7456538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7456551Z 
2025-12-04T09:19:34.7456554Z 
2025-12-04T09:19:34.7456937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7457200Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7458081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml -
2025-12-04T09:19:34.7458252Z =========================== short test summary info ============================
2025-12-04T09:19:34.7459284Z FAILED [8.6960s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7459405Z Traceback (most recent call last):
2025-12-04T09:19:34.7459964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7460084Z     getattr(self, test_name)()
2025-12-04T09:19:34.7460617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7460705Z     fn()
2025-12-04T09:19:34.7461224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7461331Z     method(*args, **kwargs)
2025-12-04T09:19:34.7461843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7462004Z     method(*args, **kwargs)
2025-12-04T09:19:34.7462506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7462609Z     with policy():
2025-12-04T09:19:34.7463116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7463224Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7464585Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640.
2025-12-04T09:19:34.7464595Z 
2025-12-04T09:19:34.7464810Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7465665Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7465671Z 
2025-12-04T09:19:34.7465933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7466120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7466295Z ======================= 1 failed, 7 deselected in 8.72s ========================
2025-12-04T09:19:34.7466393Z Got exit code 1
2025-12-04T09:19:34.7466503Z Retrying single test...
2025-12-04T09:19:34.7467185Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml
2025-12-04T09:19:34.7467404Z ============================= test session starts ==============================
2025-12-04T09:19:34.7467758Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7467869Z cachedir: .pytest_cache
2025-12-04T09:19:34.7468389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7468615Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7468717Z configfile: pytest.ini
2025-12-04T09:19:34.7469330Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7469512Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7470343Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7470447Z Running 1 items in this shard
2025-12-04T09:19:34.7470452Z 
2025-12-04T09:19:34.7471530Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:17:21.964000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24063
2025-12-04T09:19:34.7471982Z I1204 09:17:21.965000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24064
2025-12-04T09:19:34.7472421Z I1204 09:17:21.966000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24065
2025-12-04T09:19:34.7472868Z I1204 09:17:21.967000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24066
2025-12-04T09:19:34.7474443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7474605Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7476117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7476263Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7477777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7477922Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7479440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7479582Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7480000Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7480525Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7481422Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7481879Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7482754Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7483112Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7483972Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7484422Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7485274Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7485702Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7486561Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7486960Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7487875Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7488312Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7489924Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7490252Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7490850Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7492010Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7492334Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7492980Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7493511Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7493924Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7494394Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7495293Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7495743Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7496864Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7497276Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7498243Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7498732Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7499693Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7500190Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7501533Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7501986Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7502959Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7503448Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7505260Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640.
2025-12-04T09:19:34.7505626Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7506301Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7507601Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7508024Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7508852Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7509452Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7509856Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7510321Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7511215Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7511672Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7512546Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7512900Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7513749Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7514181Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7515036Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7515525Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7516375Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7516771Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7517633Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7518070Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7519678Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7519999Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7520589Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7522139Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7522609Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7523332Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7523873Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7524328Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7524857Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7525875Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7526380Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7527362Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7527766Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7528723Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7529221Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7530255Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7530753Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7531715Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7532162Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7533141Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7533732Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7535475Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7535798Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7536512Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7537984Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7538349Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7539074Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7539618Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7539734Z dist init r=0, world=4
2025-12-04T09:19:34.7539832Z dist init r=3, world=4
2025-12-04T09:19:34.7539927Z dist init r=2, world=4
2025-12-04T09:19:34.7540033Z dist init r=1, world=4
2025-12-04T09:19:34.7541192Z [rank0]:[W1204 09:17:29.808444441 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7541303Z FAILED [8.3955s] [100%]
2025-12-04T09:19:34.7541309Z 
2025-12-04T09:19:34.7541455Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7541898Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T09:19:34.7542024Z Traceback (most recent call last):
2025-12-04T09:19:34.7542570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7542694Z     self._join_processes(fn)
2025-12-04T09:19:34.7543280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7543478Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7544093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7544210Z     raise RuntimeError(error)
2025-12-04T09:19:34.7544442Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7544570Z Traceback (most recent call last):
2025-12-04T09:19:34.7545114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7545221Z     getattr(self, test_name)()
2025-12-04T09:19:34.7545759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7545850Z     fn()
2025-12-04T09:19:34.7546358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7546470Z     method(*args, **kwargs)
2025-12-04T09:19:34.7546969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7547078Z     method(*args, **kwargs)
2025-12-04T09:19:34.7547578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7547674Z     with policy():
2025-12-04T09:19:34.7548188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7548294Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7549808Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640.
2025-12-04T09:19:34.7549824Z 
2025-12-04T09:19:34.7550016Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7550772Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7550777Z 
2025-12-04T09:19:34.7551021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7551025Z 
2025-12-04T09:19:34.7551029Z 
2025-12-04T09:19:34.7551223Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7551462Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7552229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml -
2025-12-04T09:19:34.7552384Z =========================== short test summary info ============================
2025-12-04T09:19:34.7553284Z FAILED [8.3955s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7553391Z Traceback (most recent call last):
2025-12-04T09:19:34.7553886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7553984Z     getattr(self, test_name)()
2025-12-04T09:19:34.7554459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7554550Z     fn()
2025-12-04T09:19:34.7554997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7555093Z     method(*args, **kwargs)
2025-12-04T09:19:34.7555601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7555693Z     method(*args, **kwargs)
2025-12-04T09:19:34.7556144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7556229Z     with policy():
2025-12-04T09:19:34.7556678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7556777Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7557969Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640.
2025-12-04T09:19:34.7557978Z 
2025-12-04T09:19:34.7558175Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7558927Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7558932Z 
2025-12-04T09:19:34.7559174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7559333Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7559491Z ======================= 1 failed, 7 deselected in 8.42s ========================
2025-12-04T09:19:34.7559580Z Got exit code 1
2025-12-04T09:19:34.7560259Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T09:19:34.7560671Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.7561290Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml
2025-12-04T09:19:34.7561434Z ============================= test session starts ==============================
2025-12-04T09:19:34.7561752Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7561845Z cachedir: .pytest_cache
2025-12-04T09:19:34.7562299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7562415Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7562510Z configfile: pytest.ini
2025-12-04T09:19:34.7562993Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7563171Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T09:19:34.7563296Z stepcurrent: skipping 5 already run items.
2025-12-04T09:19:34.7563400Z Running 3 items in this shard
2025-12-04T09:19:34.7563404Z 
2025-12-04T09:19:34.7564485Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:17:35.174000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24400
2025-12-04T09:19:34.7564933Z I1204 09:17:35.175000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24401
2025-12-04T09:19:34.7565367Z I1204 09:17:35.176000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24402
2025-12-04T09:19:34.7565801Z I1204 09:17:35.176000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24403
2025-12-04T09:19:34.7567380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7567527Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7569044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7569192Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7570723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7570864Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7572382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7572577Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7572982Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7573464Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7574355Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7574812Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7575688Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7576044Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7577209Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7577698Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7578661Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7579145Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7580174Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7580618Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7581576Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7582068Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7583860Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7584242Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7584898Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7586208Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7586569Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7587339Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7587885Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7588332Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7589042Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7590009Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7590511Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7591475Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7591863Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7592794Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7593266Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7594199Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7594727Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7595660Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7596088Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7597021Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7597507Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7599254Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7599610Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7600249Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7601523Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7601963Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7602668Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7603194Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7603725Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7604227Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7605253Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7605712Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7606781Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7607158Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7608061Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7608523Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7609495Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7609953Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7610856Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7611273Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7612182Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7612643Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7614337Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7614684Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7615353Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7616839Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7617201Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7617920Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7618461Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7618914Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7619452Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7620454Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7621184Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7622176Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7622584Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7623642Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7624134Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7625096Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7625581Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7626549Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7626996Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7627967Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7628457Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7630257Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:19:34.7630700Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7631361Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7632772Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7633209Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7633847Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7634331Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7634419Z dist init r=1, world=4
2025-12-04T09:19:34.7634512Z dist init r=3, world=4
2025-12-04T09:19:34.7634595Z dist init r=0, world=4
2025-12-04T09:19:34.7634678Z dist init r=2, world=4
2025-12-04T09:19:34.7635705Z [rank0]:[W1204 09:17:42.995431379 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7635790Z FAILED [8.3118s] [ 33%]
2025-12-04T09:19:34.7635796Z 
2025-12-04T09:19:34.7635928Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7636318Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7636425Z Traceback (most recent call last):
2025-12-04T09:19:34.7636914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7637112Z     self._join_processes(fn)
2025-12-04T09:19:34.7637635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7637758Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7638301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7638402Z     raise RuntimeError(error)
2025-12-04T09:19:34.7638608Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7638715Z Traceback (most recent call last):
2025-12-04T09:19:34.7639196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7639290Z     getattr(self, test_name)()
2025-12-04T09:19:34.7639774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7639852Z     fn()
2025-12-04T09:19:34.7640299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7640392Z     method(*args, **kwargs)
2025-12-04T09:19:34.7640840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7640933Z     method(*args, **kwargs)
2025-12-04T09:19:34.7641378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7641460Z     with policy():
2025-12-04T09:19:34.7641963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7642058Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7643252Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7643263Z 
2025-12-04T09:19:34.7643452Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7644213Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7644218Z 
2025-12-04T09:19:34.7644456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7644465Z 
2025-12-04T09:19:34.7644469Z 
2025-12-04T09:19:34.7644663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7644900Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7645667Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml -
2025-12-04T09:19:34.7645814Z =========================== short test summary info ============================
2025-12-04T09:19:34.7646718Z FAILED [8.3118s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7646824Z Traceback (most recent call last):
2025-12-04T09:19:34.7647316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7647415Z     getattr(self, test_name)()
2025-12-04T09:19:34.7647888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7647969Z     fn()
2025-12-04T09:19:34.7648467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7648562Z     method(*args, **kwargs)
2025-12-04T09:19:34.7649005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7649094Z     method(*args, **kwargs)
2025-12-04T09:19:34.7649543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7649625Z     with policy():
2025-12-04T09:19:34.7650070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7650171Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7651370Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:19:34.7651375Z 
2025-12-04T09:19:34.7651567Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7652323Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7652328Z 
2025-12-04T09:19:34.7652562Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7652720Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7652937Z ======================= 1 failed, 5 deselected in 8.33s ========================
2025-12-04T09:19:34.7653026Z Got exit code 1
2025-12-04T09:19:34.7653117Z Retrying single test...
2025-12-04T09:19:34.7653730Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml
2025-12-04T09:19:34.7653880Z ============================= test session starts ==============================
2025-12-04T09:19:34.7654189Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7654290Z cachedir: .pytest_cache
2025-12-04T09:19:34.7654750Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7654863Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7654968Z configfile: pytest.ini
2025-12-04T09:19:34.7655452Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7655634Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7656542Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7656814Z Running 1 items in this shard
2025-12-04T09:19:34.7656820Z 
2025-12-04T09:19:34.7658044Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:17:48.454000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24737
2025-12-04T09:19:34.7658542Z I1204 09:17:48.454000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24738
2025-12-04T09:19:34.7659046Z I1204 09:17:48.455000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24739
2025-12-04T09:19:34.7659540Z I1204 09:17:48.456000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24740
2025-12-04T09:19:34.7661320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7661500Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7663209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7663386Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7665103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7665275Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7666973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7667201Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7667671Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7668209Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7669292Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7669749Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7670639Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7670997Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7671860Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7672292Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7673151Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7673599Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7674500Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7674911Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7675764Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7676217Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7677819Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7678144Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7678741Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7679908Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7680298Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7680942Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7681435Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7681836Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7682309Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7683209Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7683669Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7684559Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7684914Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7685774Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7686207Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7687060Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7687553Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7688405Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7688815Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7689671Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7690119Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7691720Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7692055Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7692638Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7693855Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7694191Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7694835Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7695330Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7695734Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7696434Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7697609Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7698122Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7699124Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7699528Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7700512Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7701081Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7702046Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7702547Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7703504Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7703970Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7704938Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7705445Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7707243Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7707673Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7708340Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7709818Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7710155Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7710793Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7711293Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7711700Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7712176Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7713074Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7713529Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7714417Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7714777Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7715697Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7716133Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7716983Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7717429Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7718280Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7718691Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7719552Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7719999Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7721968Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:19:34.7722456Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7723122Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7724432Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7724808Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7725526Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7726091Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7726196Z dist init r=3, world=4
2025-12-04T09:19:34.7726293Z dist init r=1, world=4
2025-12-04T09:19:34.7726405Z dist init r=0, world=4
2025-12-04T09:19:34.7726500Z dist init r=2, world=4
2025-12-04T09:19:34.7727677Z [rank0]:[W1204 09:17:55.287790420 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7727778Z FAILED [8.2907s] [100%]
2025-12-04T09:19:34.7727784Z 
2025-12-04T09:19:34.7727938Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7728393Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7728515Z Traceback (most recent call last):
2025-12-04T09:19:34.7729136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7729260Z     self._join_processes(fn)
2025-12-04T09:19:34.7729847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7730001Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7730608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7730722Z     raise RuntimeError(error)
2025-12-04T09:19:34.7730968Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7731092Z Traceback (most recent call last):
2025-12-04T09:19:34.7731648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7731757Z     getattr(self, test_name)()
2025-12-04T09:19:34.7732298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7732396Z     fn()
2025-12-04T09:19:34.7732908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7733013Z     method(*args, **kwargs)
2025-12-04T09:19:34.7733639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7733732Z     method(*args, **kwargs)
2025-12-04T09:19:34.7734195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7734334Z     with policy():
2025-12-04T09:19:34.7734791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7734898Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7736096Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:19:34.7736102Z 
2025-12-04T09:19:34.7736366Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7737365Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7737375Z 
2025-12-04T09:19:34.7737641Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7737659Z 
2025-12-04T09:19:34.7737663Z 
2025-12-04T09:19:34.7737885Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7738151Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7739037Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml -
2025-12-04T09:19:34.7739210Z =========================== short test summary info ============================
2025-12-04T09:19:34.7740238Z FAILED [8.2907s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.7740358Z Traceback (most recent call last):
2025-12-04T09:19:34.7740914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7741041Z     getattr(self, test_name)()
2025-12-04T09:19:34.7741646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7741739Z     fn()
2025-12-04T09:19:34.7742265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7742368Z     method(*args, **kwargs)
2025-12-04T09:19:34.7742879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7742983Z     method(*args, **kwargs)
2025-12-04T09:19:34.7743485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7743592Z     with policy():
2025-12-04T09:19:34.7744103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7744208Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7745567Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:19:34.7745574Z 
2025-12-04T09:19:34.7745785Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7746644Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7746650Z 
2025-12-04T09:19:34.7746915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7747159Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7747335Z ======================= 1 failed, 7 deselected in 8.31s ========================
2025-12-04T09:19:34.7747431Z Got exit code 1
2025-12-04T09:19:34.7747544Z Retrying single test...
2025-12-04T09:19:34.7748238Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml
2025-12-04T09:19:34.7748397Z ============================= test session starts ==============================
2025-12-04T09:19:34.7748852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7749058Z cachedir: .pytest_cache
2025-12-04T09:19:34.7749526Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7749638Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7749733Z configfile: pytest.ini
2025-12-04T09:19:34.7750216Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7750403Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7751231Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7751332Z Running 1 items in this shard
2025-12-04T09:19:34.7751336Z 
2025-12-04T09:19:34.7752411Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:18:01.694000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25074
2025-12-04T09:19:34.7752863Z I1204 09:18:01.695000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25075
2025-12-04T09:19:34.7753306Z I1204 09:18:01.696000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25076
2025-12-04T09:19:34.7753812Z I1204 09:18:01.696000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25077
2025-12-04T09:19:34.7755341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7755493Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7757009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7757159Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7758671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7758813Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7760331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7760524Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7760939Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7761415Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7762302Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7762760Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7763644Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7763999Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7764851Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7765289Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7766141Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7766573Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7767474Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7767871Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7768729Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7769165Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7770768Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7771096Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7771683Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7772845Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7773217Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7773861Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7774342Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7774745Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7775215Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7776109Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7776797Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7777793Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7778195Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7779152Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7779650Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7780684Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7781178Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7782133Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7782579Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7783553Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7784045Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7785852Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 493813760 and is now 630128640.
2025-12-04T09:19:34.7786215Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7786878Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7788242Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7788713Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7789484Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7789966Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7790370Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7790843Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7791742Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7792193Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7793066Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7793422Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7794274Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7794762Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7795615Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7796053Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7796899Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7797298Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7798159Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7798594Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7800201Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7800587Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7801181Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7802344Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7802666Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7803304Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7803787Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7804196Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7804668Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7805560Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7806015Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7806891Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7807308Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7808163Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7808601Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7809450Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7809892Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7810744Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7811137Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7812000Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7812437Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7814056Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7814429Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7815020Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7816172Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7816581Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7817478Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7818024Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7818134Z dist init r=1, world=4
2025-12-04T09:19:34.7818232Z dist init r=0, world=4
2025-12-04T09:19:34.7818327Z dist init r=2, world=4
2025-12-04T09:19:34.7818426Z dist init r=3, world=4
2025-12-04T09:19:34.7819580Z [rank0]:[W1204 09:18:08.478167312 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:19:34.7819690Z FAILED [8.9816s] [100%]
2025-12-04T09:19:34.7819695Z 
2025-12-04T09:19:34.7819839Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7820277Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T09:19:34.7820461Z Traceback (most recent call last):
2025-12-04T09:19:34.7821198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7821319Z     self._join_processes(fn)
2025-12-04T09:19:34.7821913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7822053Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7822671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7822786Z     raise RuntimeError(error)
2025-12-04T09:19:34.7823019Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7823140Z Traceback (most recent call last):
2025-12-04T09:19:34.7823688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7823805Z     getattr(self, test_name)()
2025-12-04T09:19:34.7824337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7824422Z     fn()
2025-12-04T09:19:34.7824932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7825033Z     method(*args, **kwargs)
2025-12-04T09:19:34.7825533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7825741Z     method(*args, **kwargs)
2025-12-04T09:19:34.7826249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7826351Z     with policy():
2025-12-04T09:19:34.7826864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7826968Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7828316Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7828323Z 
2025-12-04T09:19:34.7828538Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7829401Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7829411Z 
2025-12-04T09:19:34.7829677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7829682Z 
2025-12-04T09:19:34.7829853Z Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7829970Z Traceback (most recent call last):
2025-12-04T09:19:34.7830519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7830635Z     getattr(self, test_name)()
2025-12-04T09:19:34.7831170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7831256Z     fn()
2025-12-04T09:19:34.7831767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7831873Z     method(*args, **kwargs)
2025-12-04T09:19:34.7832382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7832485Z     method(*args, **kwargs)
2025-12-04T09:19:34.7833151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7833251Z     with policy():
2025-12-04T09:19:34.7833727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7833827Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7835101Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7835110Z 
2025-12-04T09:19:34.7835471Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7836301Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7836311Z 
2025-12-04T09:19:34.7836564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7836570Z 
2025-12-04T09:19:34.7836732Z Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7836843Z Traceback (most recent call last):
2025-12-04T09:19:34.7837373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7837482Z     getattr(self, test_name)()
2025-12-04T09:19:34.7837998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7838192Z     fn()
2025-12-04T09:19:34.7838696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7838793Z     method(*args, **kwargs)
2025-12-04T09:19:34.7839289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7839390Z     method(*args, **kwargs)
2025-12-04T09:19:34.7839877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7839979Z     with policy():
2025-12-04T09:19:34.7840475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7840577Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7841979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7841988Z 
2025-12-04T09:19:34.7842187Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7842991Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7842996Z 
2025-12-04T09:19:34.7843243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7843248Z 
2025-12-04T09:19:34.7843252Z 
2025-12-04T09:19:34.7843461Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7843710Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7844515Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml -
2025-12-04T09:19:34.7844687Z =========================== short test summary info ============================
2025-12-04T09:19:34.7846009Z FAILED [8.9816s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.7846128Z Traceback (most recent call last):
2025-12-04T09:19:34.7846648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7846751Z     getattr(self, test_name)()
2025-12-04T09:19:34.7847261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7847344Z     fn()
2025-12-04T09:19:34.7847831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7847934Z     method(*args, **kwargs)
2025-12-04T09:19:34.7848409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7848515Z     method(*args, **kwargs)
2025-12-04T09:19:34.7848985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7849079Z     with policy():
2025-12-04T09:19:34.7849562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7849664Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7850934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:19:34.7851007Z 
2025-12-04T09:19:34.7851208Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7852013Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7852019Z 
2025-12-04T09:19:34.7852267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7852272Z 
2025-12-04T09:19:34.7852424Z Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7852542Z Traceback (most recent call last):
2025-12-04T09:19:34.7853057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7853162Z     getattr(self, test_name)()
2025-12-04T09:19:34.7853661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7853744Z     fn()
2025-12-04T09:19:34.7854221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7854325Z     method(*args, **kwargs)
2025-12-04T09:19:34.7854795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7854896Z     method(*args, **kwargs)
2025-12-04T09:19:34.7855362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7855456Z     with policy():
2025-12-04T09:19:34.7855931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7856031Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7857616Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:19:34.7857689Z 
2025-12-04T09:19:34.7857902Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7858757Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7858762Z 
2025-12-04T09:19:34.7859023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7859028Z 
2025-12-04T09:19:34.7859194Z Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.7859308Z Traceback (most recent call last):
2025-12-04T09:19:34.7859850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7859971Z     getattr(self, test_name)()
2025-12-04T09:19:34.7860502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7860595Z     fn()
2025-12-04T09:19:34.7861104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7861203Z     method(*args, **kwargs)
2025-12-04T09:19:34.7861710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7861809Z     method(*args, **kwargs)
2025-12-04T09:19:34.7862311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7862412Z     with policy():
2025-12-04T09:19:34.7862919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7863080Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7864452Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:19:34.7864459Z 
2025-12-04T09:19:34.7864668Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7865522Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7865527Z 
2025-12-04T09:19:34.7865787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7865971Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7866149Z ======================= 1 failed, 7 deselected in 9.00s ========================
2025-12-04T09:19:34.7866242Z Got exit code 1
2025-12-04T09:19:34.7867021Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T09:19:34.7867425Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.7868099Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml
2025-12-04T09:19:34.7868266Z ============================= test session starts ==============================
2025-12-04T09:19:34.7868700Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7868797Z cachedir: .pytest_cache
2025-12-04T09:19:34.7869252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7869359Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7869458Z configfile: pytest.ini
2025-12-04T09:19:34.7869984Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7870171Z collecting ... collected 8 items / 6 deselected / 2 selected
2025-12-04T09:19:34.7870290Z stepcurrent: skipping 6 already run items.
2025-12-04T09:19:34.7870387Z Running 2 items in this shard
2025-12-04T09:19:34.7870391Z 
2025-12-04T09:19:34.7871303Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:14.894000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25411
2025-12-04T09:19:34.7871738Z I1204 09:18:14.894000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25412
2025-12-04T09:19:34.7872183Z I1204 09:18:14.895000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25413
2025-12-04T09:19:34.7872617Z I1204 09:18:14.896000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25414
2025-12-04T09:19:34.7874155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7874310Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7875841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7876044Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7877558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7877706Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7879207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7879363Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7879769Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7880243Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7881134Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7881580Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7882693Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7883064Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7883972Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7884430Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7885328Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7885793Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7886698Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7887125Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7888032Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7888498Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7890067Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.7890410Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7891210Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7892283Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7892641Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7893335Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7893867Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7894301Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7894811Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7895794Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7896353Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7897578Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7897978Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7898940Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7899430Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7900400Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7900890Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7901852Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7902305Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7903263Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7903803Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7905427Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.7905788Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7906453Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7907561Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7907935Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7908852Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7909370Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.7909793Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7910291Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7911290Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7911768Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7912704Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7913073Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7913975Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7914443Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7915345Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7915806Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7916706Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7917124Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7918253Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7918727Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7920301Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.7920647Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7921605Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7922720Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7923089Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7923804Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7924346Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7924810Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7925334Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7926436Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7926944Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7927938Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7928329Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7929292Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7929784Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7930742Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7931231Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7932186Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7932703Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7933765Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7934222Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7935744Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792.
2025-12-04T09:19:34.7936088Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7936956Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7938069Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7938435Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7939147Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7939694Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.7939806Z dist init r=3, world=4
2025-12-04T09:19:34.7939903Z dist init r=0, world=4
2025-12-04T09:19:34.7940071Z dist init r=1, world=4
2025-12-04T09:19:34.7940167Z dist init r=2, world=4
2025-12-04T09:19:34.7940266Z FAILED [8.4909s] [ 50%]
2025-12-04T09:19:34.7940271Z 
2025-12-04T09:19:34.7940432Z =================================== FAILURES ===================================
2025-12-04T09:19:34.7940727Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T09:19:34.7940847Z Traceback (most recent call last):
2025-12-04T09:19:34.7941399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.7941510Z     self._join_processes(fn)
2025-12-04T09:19:34.7942100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.7942243Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.7942847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.7942966Z     raise RuntimeError(error)
2025-12-04T09:19:34.7943199Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7943323Z Traceback (most recent call last):
2025-12-04T09:19:34.7943864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7943976Z     getattr(self, test_name)()
2025-12-04T09:19:34.7944512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7944597Z     fn()
2025-12-04T09:19:34.7945157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7945265Z     method(*args, **kwargs)
2025-12-04T09:19:34.7945774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7945885Z     method(*args, **kwargs)
2025-12-04T09:19:34.7946389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7946485Z     with policy():
2025-12-04T09:19:34.7947000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7947107Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7948278Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.7948294Z 
2025-12-04T09:19:34.7948618Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7949337Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7949342Z 
2025-12-04T09:19:34.7949587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7949591Z 
2025-12-04T09:19:34.7949596Z 
2025-12-04T09:19:34.7949789Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.7950025Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.7950779Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml -
2025-12-04T09:19:34.7950932Z =========================== short test summary info ============================
2025-12-04T09:19:34.7951665Z FAILED [8.4909s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.7951836Z Traceback (most recent call last):
2025-12-04T09:19:34.7952332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7952429Z     getattr(self, test_name)()
2025-12-04T09:19:34.7952905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7952990Z     fn()
2025-12-04T09:19:34.7953440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7953532Z     method(*args, **kwargs)
2025-12-04T09:19:34.7953985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7954081Z     method(*args, **kwargs)
2025-12-04T09:19:34.7954535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7954618Z     with policy():
2025-12-04T09:19:34.7955064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7955163Z     raise RuntimeError(msg)
2025-12-04T09:19:34.7956191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.7956197Z 
2025-12-04T09:19:34.7956395Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7957033Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7957038Z 
2025-12-04T09:19:34.7957271Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7957435Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.7957593Z ======================= 1 failed, 6 deselected in 8.51s ========================
2025-12-04T09:19:34.7957688Z Got exit code 1
2025-12-04T09:19:34.7957779Z Retrying single test...
2025-12-04T09:19:34.7958382Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml
2025-12-04T09:19:34.7958534Z ============================= test session starts ==============================
2025-12-04T09:19:34.7958843Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.7958939Z cachedir: .pytest_cache
2025-12-04T09:19:34.7959403Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.7959507Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.7959609Z configfile: pytest.ini
2025-12-04T09:19:34.7960078Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.7960260Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.7960921Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7961017Z Running 1 items in this shard
2025-12-04T09:19:34.7961021Z 
2025-12-04T09:19:34.7961934Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:28.114000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25740
2025-12-04T09:19:34.7962378Z I1204 09:18:28.114000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25741
2025-12-04T09:19:34.7962866Z I1204 09:18:28.115000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25742
2025-12-04T09:19:34.7963311Z I1204 09:18:28.116000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25743
2025-12-04T09:19:34.7964837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7964993Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7966512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7966662Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7968173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7968369Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7969876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.7970017Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.7970426Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7970899Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7971787Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7972237Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7973130Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7973481Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7974331Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7974768Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7975670Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7976109Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7977243Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7977698Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7978657Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7979151Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7980781Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.7981146Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7981805Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7982976Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7983349Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7984067Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7984610Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.7985068Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7985595Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.7986607Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.7987115Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.7988105Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.7988502Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.7989505Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7989942Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7990836Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.7991272Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.7992120Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.7992524Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.7993383Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.7993812Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.7995249Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.7995571Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7996223Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.7997377Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.7997720Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.7998390Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.7998900Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.7999328Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.7999825Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8000779Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8001251Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8002191Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8002559Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8003517Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8003984Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8004884Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8005341Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8006242Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8006663Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8007578Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8008039Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8009560Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.8009954Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8010579Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8011686Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8012013Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8012640Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8013122Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.8013529Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8014003Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8014897Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8015342Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8016271Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8016803Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8017817Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8018305Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8019261Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8019756Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8020722Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8021344Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8022311Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8022798Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8024409Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792.
2025-12-04T09:19:34.8024880Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8025545Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8026650Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8027015Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8027731Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8028278Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.8028386Z dist init r=0, world=4
2025-12-04T09:19:34.8028486Z dist init r=2, world=4
2025-12-04T09:19:34.8028580Z dist init r=1, world=4
2025-12-04T09:19:34.8028682Z dist init r=3, world=4
2025-12-04T09:19:34.8028777Z FAILED [8.5045s] [100%]
2025-12-04T09:19:34.8028783Z 
2025-12-04T09:19:34.8028935Z =================================== FAILURES ===================================
2025-12-04T09:19:34.8029229Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T09:19:34.8029348Z Traceback (most recent call last):
2025-12-04T09:19:34.8029905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.8030023Z     self._join_processes(fn)
2025-12-04T09:19:34.8030611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.8030829Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.8031434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.8031553Z     raise RuntimeError(error)
2025-12-04T09:19:34.8031784Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.8031903Z Traceback (most recent call last):
2025-12-04T09:19:34.8032445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8032555Z     getattr(self, test_name)()
2025-12-04T09:19:34.8033178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8033269Z     fn()
2025-12-04T09:19:34.8033749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8033860Z     method(*args, **kwargs)
2025-12-04T09:19:34.8034332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8034428Z     method(*args, **kwargs)
2025-12-04T09:19:34.8034908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8034998Z     with policy():
2025-12-04T09:19:34.8035485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8035583Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8036727Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.8036733Z 
2025-12-04T09:19:34.8036946Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8037565Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8037569Z 
2025-12-04T09:19:34.8037821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8037826Z 
2025-12-04T09:19:34.8037830Z 
2025-12-04T09:19:34.8038038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.8038283Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.8039105Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml -
2025-12-04T09:19:34.8039265Z =========================== short test summary info ============================
2025-12-04T09:19:34.8040051Z FAILED [8.5045s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:19:34.8040163Z Traceback (most recent call last):
2025-12-04T09:19:34.8040682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8040791Z     getattr(self, test_name)()
2025-12-04T09:19:34.8041295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8041486Z     fn()
2025-12-04T09:19:34.8041935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8042029Z     method(*args, **kwargs)
2025-12-04T09:19:34.8042536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8042630Z     method(*args, **kwargs)
2025-12-04T09:19:34.8043079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8043173Z     with policy():
2025-12-04T09:19:34.8043627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8043731Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8044767Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.8044776Z 
2025-12-04T09:19:34.8044965Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8045560Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8045565Z 
2025-12-04T09:19:34.8045798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8045964Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.8046122Z ======================= 1 failed, 7 deselected in 8.53s ========================
2025-12-04T09:19:34.8046206Z Got exit code 1
2025-12-04T09:19:34.8046307Z Retrying single test...
2025-12-04T09:19:34.8046910Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml
2025-12-04T09:19:34.8047178Z ============================= test session starts ==============================
2025-12-04T09:19:34.8047486Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.8047578Z cachedir: .pytest_cache
2025-12-04T09:19:34.8048042Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.8048151Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.8048243Z configfile: pytest.ini
2025-12-04T09:19:34.8048723Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.8048905Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.8049564Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8049665Z Running 1 items in this shard
2025-12-04T09:19:34.8049670Z 
2025-12-04T09:19:34.8050588Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:41.314000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26069
2025-12-04T09:19:34.8051040Z I1204 09:18:41.315000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26070
2025-12-04T09:19:34.8051479Z I1204 09:18:41.316000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26071
2025-12-04T09:19:34.8051925Z I1204 09:18:41.316000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26072
2025-12-04T09:19:34.8053462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8053617Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8055180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8055330Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8057128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8057338Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8059047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8059209Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8059674Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8060208Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8061281Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8061788Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8062773Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8063170Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8064126Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8064624Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8065586Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8066077Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8067036Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8067476Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8068504Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8069097Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8070528Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8070850Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8071612Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8072656Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8072994Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8073672Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8074185Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.8074665Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8075163Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8076120Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8076596Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8077521Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8077897Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8078804Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8079265Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8080164Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8080624Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8081521Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8081940Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8082900Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8083478Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8084913Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.8085236Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8085826Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8086808Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8087124Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8087757Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8088295Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.8088703Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8089171Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8090064Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8090511Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8091384Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8091742Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8092592Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8093025Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8093878Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8094304Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8095221Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8095613Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8096547Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8097201Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8098816Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792.
2025-12-04T09:19:34.8099188Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8099850Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8100960Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8101324Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8102111Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8102655Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.8103115Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8103644Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8104646Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8105160Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8106156Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8106561Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8107517Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8108010Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8109055Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8109490Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8110391Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8110782Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8111646Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8112083Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8113521Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.8113841Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8114425Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8115403Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8115772Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8116416Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8116898Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.8116994Z dist init r=2, world=4
2025-12-04T09:19:34.8117084Z dist init r=0, world=4
2025-12-04T09:19:34.8117171Z dist init r=1, world=4
2025-12-04T09:19:34.8117262Z dist init r=3, world=4
2025-12-04T09:19:34.8117344Z FAILED [8.6665s] [100%]
2025-12-04T09:19:34.8117349Z 
2025-12-04T09:19:34.8117478Z =================================== FAILURES ===================================
2025-12-04T09:19:34.8117748Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T09:19:34.8117855Z Traceback (most recent call last):
2025-12-04T09:19:34.8118344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.8118448Z     self._join_processes(fn)
2025-12-04T09:19:34.8118963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.8119094Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.8119628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.8119727Z     raise RuntimeError(error)
2025-12-04T09:19:34.8119938Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.8120047Z Traceback (most recent call last):
2025-12-04T09:19:34.8120533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8120633Z     getattr(self, test_name)()
2025-12-04T09:19:34.8121482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8121582Z     fn()
2025-12-04T09:19:34.8122087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8122189Z     method(*args, **kwargs)
2025-12-04T09:19:34.8122699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8122801Z     method(*args, **kwargs)
2025-12-04T09:19:34.8123318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8123412Z     with policy():
2025-12-04T09:19:34.8123921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8124035Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8125201Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8125207Z 
2025-12-04T09:19:34.8125426Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8126079Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8126085Z 
2025-12-04T09:19:34.8126345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8126362Z 
2025-12-04T09:19:34.8126523Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.8126738Z Traceback (most recent call last):
2025-12-04T09:19:34.8127291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8127404Z     getattr(self, test_name)()
2025-12-04T09:19:34.8127944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8128041Z     fn()
2025-12-04T09:19:34.8128547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8128661Z     method(*args, **kwargs)
2025-12-04T09:19:34.8129163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8129266Z     method(*args, **kwargs)
2025-12-04T09:19:34.8129772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8129874Z     with policy():
2025-12-04T09:19:34.8130382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8130500Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8131669Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792.
2025-12-04T09:19:34.8131675Z 
2025-12-04T09:19:34.8131894Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8132547Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8132553Z 
2025-12-04T09:19:34.8132814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8132829Z 
2025-12-04T09:19:34.8132834Z 
2025-12-04T09:19:34.8133050Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.8133314Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.8134272Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml -
2025-12-04T09:19:34.8134427Z =========================== short test summary info ============================
2025-12-04T09:19:34.8135170Z FAILED [8.6665s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.8135276Z Traceback (most recent call last):
2025-12-04T09:19:34.8135759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8135868Z     getattr(self, test_name)()
2025-12-04T09:19:34.8136406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8136485Z     fn()
2025-12-04T09:19:34.8137157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8137261Z     method(*args, **kwargs)
2025-12-04T09:19:34.8137772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8137874Z     method(*args, **kwargs)
2025-12-04T09:19:34.8138378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8138484Z     with policy():
2025-12-04T09:19:34.8138992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8139162Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8140327Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8140333Z 
2025-12-04T09:19:34.8140544Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8141215Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8141220Z 
2025-12-04T09:19:34.8141482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8141488Z 
2025-12-04T09:19:34.8141656Z Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.8141780Z Traceback (most recent call last):
2025-12-04T09:19:34.8142324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8142448Z     getattr(self, test_name)()
2025-12-04T09:19:34.8142988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8143087Z     fn()
2025-12-04T09:19:34.8143591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8143693Z     method(*args, **kwargs)
2025-12-04T09:19:34.8144203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8144307Z     method(*args, **kwargs)
2025-12-04T09:19:34.8144804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8144913Z     with policy():
2025-12-04T09:19:34.8145423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8145542Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8146769Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792.
2025-12-04T09:19:34.8146776Z 
2025-12-04T09:19:34.8146989Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8147651Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8147656Z 
2025-12-04T09:19:34.8147913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8148099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.8148277Z ======================= 1 failed, 7 deselected in 8.69s ========================
2025-12-04T09:19:34.8148374Z Got exit code 1
2025-12-04T09:19:34.8149164Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T09:19:34.8149519Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.8150126Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml
2025-12-04T09:19:34.8150269Z ============================= test session starts ==============================
2025-12-04T09:19:34.8150576Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.8150683Z cachedir: .pytest_cache
2025-12-04T09:19:34.8151135Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.8151293Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.8151393Z configfile: pytest.ini
2025-12-04T09:19:34.8151871Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.8152059Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.8152180Z stepcurrent: skipping 7 already run items.
2025-12-04T09:19:34.8152279Z Running 1 items in this shard
2025-12-04T09:19:34.8152283Z 
2025-12-04T09:19:34.8153214Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:18:54.494000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26398
2025-12-04T09:19:34.8153651Z I1204 09:18:54.495000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26399
2025-12-04T09:19:34.8154098Z I1204 09:18:54.496000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26400
2025-12-04T09:19:34.8154536Z I1204 09:18:54.497000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26401
2025-12-04T09:19:34.8156057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8156210Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8157725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8157932Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8159443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8159595Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8161109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8161264Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8161671Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8162141Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8163036Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8163486Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8164431Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8164787Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8165649Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8166083Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8166933Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8167376Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8168226Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8168635Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8169491Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8169940Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8171424Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8171754Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8172348Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8173332Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8173664Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8174300Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8174790Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.8175190Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8175658Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8176795Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8177646Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8178645Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8179046Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8180023Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8180512Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8181477Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8181976Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8182932Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8183387Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8184346Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8184848Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8186517Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 602865664 and is now 632225792.
2025-12-04T09:19:34.8186880Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8187551Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8188651Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8189123Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8189757Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8190245Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.8190643Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8191111Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8192074Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8192525Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8193411Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8193758Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8194619Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8195055Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8195905Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8196343Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8197189Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8197593Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8198446Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8198944Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8200380Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.8200701Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8201295Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8202472Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8202817Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8203487Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8204005Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.8204427Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8204974Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8205928Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8206403Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8207343Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8207710Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8208625Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8209087Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8209991Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8210634Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8211558Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8212004Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8212990Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8213477Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8215042Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 583991296 and is now 632225792.
2025-12-04T09:19:34.8215392Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8216041Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8217382Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8217751Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8218463Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8219016Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.8219184Z dist init r=1, world=4
2025-12-04T09:19:34.8219284Z dist init r=3, world=4
2025-12-04T09:19:34.8219393Z dist init r=2, world=4
2025-12-04T09:19:34.8219497Z dist init r=0, world=4
2025-12-04T09:19:34.8219593Z FAILED [8.3923s] [100%]
2025-12-04T09:19:34.8219599Z 
2025-12-04T09:19:34.8219755Z =================================== FAILURES ===================================
2025-12-04T09:19:34.8220053Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T09:19:34.8220175Z Traceback (most recent call last):
2025-12-04T09:19:34.8220731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.8221014Z     self._join_processes(fn)
2025-12-04T09:19:34.8221612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.8221764Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.8222374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.8222503Z     raise RuntimeError(error)
2025-12-04T09:19:34.8222742Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.8222875Z Traceback (most recent call last):
2025-12-04T09:19:34.8223416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8223530Z     getattr(self, test_name)()
2025-12-04T09:19:34.8224071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8224162Z     fn()
2025-12-04T09:19:34.8224669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8224792Z     method(*args, **kwargs)
2025-12-04T09:19:34.8225297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8225515Z     method(*args, **kwargs)
2025-12-04T09:19:34.8226025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8226121Z     with policy():
2025-12-04T09:19:34.8226642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8226753Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8227912Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8227931Z 
2025-12-04T09:19:34.8228150Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8228813Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8228819Z 
2025-12-04T09:19:34.8229102Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8229108Z 
2025-12-04T09:19:34.8229112Z 
2025-12-04T09:19:34.8229336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.8229607Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.8230475Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml -
2025-12-04T09:19:34.8230646Z =========================== short test summary info ============================
2025-12-04T09:19:34.8231543Z FAILED [8.3923s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:19:34.8231664Z Traceback (most recent call last):
2025-12-04T09:19:34.8232234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8232351Z     getattr(self, test_name)()
2025-12-04T09:19:34.8232987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8233082Z     fn()
2025-12-04T09:19:34.8239824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8239978Z     method(*args, **kwargs)
2025-12-04T09:19:34.8240609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8240726Z     method(*args, **kwargs)
2025-12-04T09:19:34.8241208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8241303Z     with policy():
2025-12-04T09:19:34.8241793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8241894Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8242989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8243001Z 
2025-12-04T09:19:34.8243201Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8243823Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8243834Z 
2025-12-04T09:19:34.8244089Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8244371Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.8244544Z ======================= 1 failed, 7 deselected in 8.41s ========================
2025-12-04T09:19:34.8244635Z Got exit code 1
2025-12-04T09:19:34.8244730Z Retrying single test...
2025-12-04T09:19:34.8245373Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml
2025-12-04T09:19:34.8245524Z ============================= test session starts ==============================
2025-12-04T09:19:34.8245852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.8245961Z cachedir: .pytest_cache
2025-12-04T09:19:34.8246445Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.8246567Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.8246665Z configfile: pytest.ini
2025-12-04T09:19:34.8247166Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.8247367Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.8248064Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8248168Z Running 1 items in this shard
2025-12-04T09:19:34.8248174Z 
2025-12-04T09:19:34.8249143Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:19:07.584000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26727
2025-12-04T09:19:34.8249691Z I1204 09:19:07.585000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26728
2025-12-04T09:19:34.8250162Z I1204 09:19:07.586000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26729
2025-12-04T09:19:34.8250724Z I1204 09:19:07.586000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26730
2025-12-04T09:19:34.8252255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8252402Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8253935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8254087Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8255588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8255736Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8257754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8257923Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8258381Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8258917Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8259919Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8260431Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8261427Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8261822Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8262788Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8263273Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8264299Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8264789Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8265746Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8266193Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8267155Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8267652Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8269334Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.8269661Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8270242Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8271225Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8271607Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8272243Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8272729Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.8273125Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8273598Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8274490Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8274941Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8275823Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8276170Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8277023Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8277503Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8278367Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8278796Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8279645Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8280044Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8280897Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8281337Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8282775Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792.
2025-12-04T09:19:34.8283106Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8283695Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8284727Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8285056Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8285688Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8286175Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.8286573Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8287044Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8287939Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8288389Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8289264Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8289613Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8290523Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8290954Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8291801Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8292235Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8293082Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8293487Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8294349Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8294787Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8296278Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792.
2025-12-04T09:19:34.8296780Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8297517Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8298625Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8298992Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8299704Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8300257Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.8300712Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8301246Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8302251Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8302755Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8303748Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8304198Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8305173Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8305661Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8306615Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8307102Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8308059Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8308620Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8309606Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8310044Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8311467Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8311850Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8312433Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8313415Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8313743Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8314379Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8314871Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.8314965Z dist init r=1, world=4
2025-12-04T09:19:34.8315054Z dist init r=3, world=4
2025-12-04T09:19:34.8315150Z dist init r=0, world=4
2025-12-04T09:19:34.8315235Z dist init r=2, world=4
2025-12-04T09:19:34.8315319Z FAILED [8.3993s] [100%]
2025-12-04T09:19:34.8315324Z 
2025-12-04T09:19:34.8315458Z =================================== FAILURES ===================================
2025-12-04T09:19:34.8315721Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T09:19:34.8315835Z Traceback (most recent call last):
2025-12-04T09:19:34.8316315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.8316463Z     self._join_processes(fn)
2025-12-04T09:19:34.8316984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.8317113Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.8317653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.8317751Z     raise RuntimeError(error)
2025-12-04T09:19:34.8317957Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.8318067Z Traceback (most recent call last):
2025-12-04T09:19:34.8318542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8318639Z     getattr(self, test_name)()
2025-12-04T09:19:34.8319112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8319192Z     fn()
2025-12-04T09:19:34.8319644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8319737Z     method(*args, **kwargs)
2025-12-04T09:19:34.8320182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8320277Z     method(*args, **kwargs)
2025-12-04T09:19:34.8321026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8321120Z     with policy():
2025-12-04T09:19:34.8321786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8321891Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8323067Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792.
2025-12-04T09:19:34.8323079Z 
2025-12-04T09:19:34.8323397Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8324062Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8324068Z 
2025-12-04T09:19:34.8324330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8324335Z 
2025-12-04T09:19:34.8324340Z 
2025-12-04T09:19:34.8324557Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.8324823Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.8325679Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml -
2025-12-04T09:19:34.8325858Z =========================== short test summary info ============================
2025-12-04T09:19:34.8326688Z FAILED [8.3993s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:19:34.8326806Z Traceback (most recent call last):
2025-12-04T09:19:34.8327363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8327470Z     getattr(self, test_name)()
2025-12-04T09:19:34.8328011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8328098Z     fn()
2025-12-04T09:19:34.8328601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8328788Z     method(*args, **kwargs)
2025-12-04T09:19:34.8329293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8329402Z     method(*args, **kwargs)
2025-12-04T09:19:34.8329904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8329997Z     with policy():
2025-12-04T09:19:34.8330509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8330614Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8331775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792.
2025-12-04T09:19:34.8331792Z 
2025-12-04T09:19:34.8332003Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8332667Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8332672Z 
2025-12-04T09:19:34.8332942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8333120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.8333294Z ======================= 1 failed, 7 deselected in 8.42s ========================
2025-12-04T09:19:34.8333394Z Got exit code 1
2025-12-04T09:19:34.8333494Z Retrying single test...
2025-12-04T09:19:34.8334261Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml
2025-12-04T09:19:34.8334513Z ============================= test session starts ==============================
2025-12-04T09:19:34.8334823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.8334924Z cachedir: .pytest_cache
2025-12-04T09:19:34.8335430Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.8335545Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.8335639Z configfile: pytest.ini
2025-12-04T09:19:34.8336113Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.8336375Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:19:34.8337251Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8337360Z Running 1 items in this shard
2025-12-04T09:19:34.8337370Z 
2025-12-04T09:19:34.8338403Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:19:20.674000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27056
2025-12-04T09:19:34.8338905Z I1204 09:19:20.675000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27057
2025-12-04T09:19:34.8339408Z I1204 09:19:20.675000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27058
2025-12-04T09:19:34.8339895Z I1204 09:19:20.676000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27059
2025-12-04T09:19:34.8341622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8341853Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8343554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8343725Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8345423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8345594Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8347298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:19:34.8347463Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:19:34.8347920Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8348460Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8349500Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8350011Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8350894Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8351245Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8352102Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8352537Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8353394Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8353824Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8354669Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8355071Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8355977Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8356419Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8357853Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792.
2025-12-04T09:19:34.8358182Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8358769Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8359748Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8360077Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8360711Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8361203Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:19:34.8361606Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8362090Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8363024Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8363476Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8364357Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8364705Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8365563Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8365997Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8366854Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8367287Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8368131Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8368581Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8369436Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8369877Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8371305Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696.
2025-12-04T09:19:34.8371635Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8372222Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8373204Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8373528Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8374155Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8374638Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:19:34.8375040Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8375555Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8376516Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8377172Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8378173Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8378569Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8379535Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8380019Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8380978Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8381472Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8382496Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8382948Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8383907Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8384401Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8386017Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792.
2025-12-04T09:19:34.8386388Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8387045Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8388155Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8388625Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8389393Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8389882Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:19:34.8390329Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:19:34.8390796Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:19:34.8391691Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8392138Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:19:34.8393019Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8393373Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:19:34.8394234Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8394663Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8395508Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8396007Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:19:34.8396854Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8397252Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:19:34.8398102Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8398540Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:19:34.8399968Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792.
2025-12-04T09:19:34.8400293Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8400877Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8401865Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8402191Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:19:34.8402866Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8403351Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:19:34.8403439Z dist init r=1, world=4
2025-12-04T09:19:34.8403526Z dist init r=3, world=4
2025-12-04T09:19:34.8403619Z dist init r=0, world=4
2025-12-04T09:19:34.8403702Z dist init r=2, world=4
2025-12-04T09:19:34.8403787Z FAILED [8.5741s] [100%]
2025-12-04T09:19:34.8403791Z 
2025-12-04T09:19:34.8403926Z =================================== FAILURES ===================================
2025-12-04T09:19:34.8404188Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T09:19:34.8404307Z Traceback (most recent call last):
2025-12-04T09:19:34.8404786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:19:34.8404883Z     self._join_processes(fn)
2025-12-04T09:19:34.8405411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:19:34.8405534Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:19:34.8406080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:19:34.8406178Z     raise RuntimeError(error)
2025-12-04T09:19:34.8406384Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.8406496Z Traceback (most recent call last):
2025-12-04T09:19:34.8406972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8407117Z     getattr(self, test_name)()
2025-12-04T09:19:34.8407595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8407673Z     fn()
2025-12-04T09:19:34.8408133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8408224Z     method(*args, **kwargs)
2025-12-04T09:19:34.8408670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8408765Z     method(*args, **kwargs)
2025-12-04T09:19:34.8409214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8409298Z     with policy():
2025-12-04T09:19:34.8409750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8409846Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8410881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792.
2025-12-04T09:19:34.8410888Z 
2025-12-04T09:19:34.8411077Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8411660Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8411672Z 
2025-12-04T09:19:34.8411912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8411917Z 
2025-12-04T09:19:34.8411921Z 
2025-12-04T09:19:34.8412110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:19:34.8412341Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:19:34.8413155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml -
2025-12-04T09:19:34.8413304Z =========================== short test summary info ============================
2025-12-04T09:19:34.8414034Z FAILED [8.5741s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:19:34.8414139Z Traceback (most recent call last):
2025-12-04T09:19:34.8414626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:19:34.8414727Z     getattr(self, test_name)()
2025-12-04T09:19:34.8415200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:19:34.8415284Z     fn()
2025-12-04T09:19:34.8415728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8415823Z     method(*args, **kwargs)
2025-12-04T09:19:34.8416361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:19:34.8416457Z     method(*args, **kwargs)
2025-12-04T09:19:34.8417104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:19:34.8417203Z     with policy():
2025-12-04T09:19:34.8417707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:19:34.8417820Z     raise RuntimeError(msg)
2025-12-04T09:19:34.8418984Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792.
2025-12-04T09:19:34.8419050Z 
2025-12-04T09:19:34.8419267Z To execute this test, run the following from the base repo dir:
2025-12-04T09:19:34.8419926Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8419931Z 
2025-12-04T09:19:34.8420197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:19:34.8420378Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:19:34.8420550Z ======================= 1 failed, 7 deselected in 8.60s ========================
2025-12-04T09:19:34.8420645Z Got exit code 1
2025-12-04T09:19:34.8421430Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T09:19:34.8421838Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:19:34.8422525Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml
2025-12-04T09:19:34.8422683Z ============================= test session starts ==============================
2025-12-04T09:19:34.8423027Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:19:34.8423138Z cachedir: .pytest_cache
2025-12-04T09:19:34.8423649Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:19:34.8423767Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:19:34.8423876Z configfile: pytest.ini
2025-12-04T09:19:34.8424408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:19:34.8424620Z collecting ... collected 8 items / 8 deselected / 0 selected
2025-12-04T09:19:34.8424758Z stepcurrent: skipping 8 already run items.
2025-12-04T09:19:34.8424867Z Running 0 items in this shard
2025-12-04T09:19:34.8424976Z 
2025-12-04T09:19:34.8425838Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml -
2025-12-04T09:19:34.8426002Z ============================ 8 deselected in 0.01s =============================
2025-12-04T09:19:34.8431417Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda']
2025-12-04T09:19:34.8431429Z 
2025-12-04T09:19:34.8432100Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log)
2025-12-04T09:19:34.8432177Z 
2025-12-04T09:19:34.8432577Z Finished distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 09:19:34.544881][1606.152791418], took 5.35min
2025-12-04T09:19:34.8433516Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml
2025-12-04T09:19:34.8434347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml
2025-12-04T09:19:34.8435144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml
2025-12-04T09:19:34.8435953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml
2025-12-04T09:19:34.8436756Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml
2025-12-04T09:19:34.8437567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml
2025-12-04T09:19:34.8438369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml
2025-12-04T09:19:34.8439164Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml
2025-12-04T09:19:34.8537798Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml
2025-12-04T09:19:34.8819300Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml
2025-12-04T09:19:34.9117196Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml
2025-12-04T09:19:34.9424132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml
2025-12-04T09:19:34.9827006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml
2025-12-04T09:19:35.0161066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml
2025-12-04T09:19:35.0479997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml
2025-12-04T09:19:35.0819768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml
2025-12-04T09:19:35.1139087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml
2025-12-04T09:19:35.1450489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml
2025-12-04T09:19:35.1739862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml
2025-12-04T09:19:35.2071063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml
2025-12-04T09:19:35.2351611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml
2025-12-04T09:19:35.2609331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml
2025-12-04T09:19:35.2922771Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml
2025-12-04T09:19:35.3238800Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml
2025-12-04T09:19:35.3578851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml
2025-12-04T09:19:35.6738512Z Uploading logs for 57116084904 to S3
2025-12-04T09:19:35.7157544Z Uploading artifacts took 0.33 seconds
2025-12-04T09:19:35.7158121Z distributed/fsdp/test_fsdp_exec_order 1/1 failed!
2025-12-04T09:19:35.7163553Z Running distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:19:35.715768][1607.323683885]
2025-12-04T09:19:35.7164207Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:19:35.7165505Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_hsdp_dtensor_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:19:35.716087]
2025-12-04T09:25:19.6288802Z 
2025-12-04T09:25:19.6290028Z PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log)
2025-12-04T09:25:19.6291619Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml
2025-12-04T09:25:19.6292871Z ============================= test session starts ==============================
2025-12-04T09:25:19.6293531Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.6294111Z cachedir: .pytest_cache
2025-12-04T09:25:19.6294818Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.6295588Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.6295921Z configfile: pytest.ini
2025-12-04T09:25:19.6296936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.6297750Z collecting ... collected 8 items
2025-12-04T09:25:19.6298171Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:25:19.6304793Z Running 8 items in this shard: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.6311624Z 
2025-12-04T09:25:19.6312942Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:19:39.134000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27442
2025-12-04T09:25:19.6314874Z I1204 09:19:39.135000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27443
2025-12-04T09:25:19.6316034Z I1204 09:19:39.136000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27444
2025-12-04T09:25:19.6317172Z I1204 09:19:39.136000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27445
2025-12-04T09:25:19.6320252Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6323168Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6325900Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6328565Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6331157Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6333801Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6336492Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6339142Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6343893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6349069Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6353966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6358834Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6363773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6368614Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6373480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
﻿2025-12-04T09:25:19.6382897Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6383876Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6384961Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6386602Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6388201Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6389879Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6391343Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6392745Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6394239Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6395738Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6397230Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6398833Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6400285Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6401754Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6403254Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6405610Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.6407830Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6408903Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6410989Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6412815Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6413962Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6415343Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.6416483Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6417728Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6419358Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6421139Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6422736Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6424216Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6425675Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6427216Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6428752Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6430393Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6431949Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6433619Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6435042Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6436497Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6438784Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.6440931Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6441985Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6444036Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6445710Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6446807Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6448010Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.6448977Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6449937Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6451377Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6452801Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6454448Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6455839Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6457512Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6459057Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6460660Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6462201Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6463739Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6465226Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6466726Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6468270Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6470707Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.6472728Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6473711Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6475647Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6477317Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6478361Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6479566Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.6480536Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6481497Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6482955Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6484377Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6485787Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6487102Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6488399Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6489770Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6491176Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6492547Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6493925Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6495264Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6496865Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6498416Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6500839Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 302972928 and is now 613351424.
2025-12-04T09:25:19.6503117Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6504276Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6506427Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6508300Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6509626Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6510830Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.6511681Z FAILED [9.3991s] [ 12%]
2025-12-04T09:25:19.6511854Z 
2025-12-04T09:25:19.6511998Z =================================== FAILURES ===================================
2025-12-04T09:25:19.6512740Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.6513457Z Traceback (most recent call last):
2025-12-04T09:25:19.6514207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.6514953Z     self._join_processes(fn)
2025-12-04T09:25:19.6515711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.6516532Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.6517372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.6518178Z     raise RuntimeError(error)
2025-12-04T09:25:19.6518604Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.6519074Z Traceback (most recent call last):
2025-12-04T09:25:19.6519808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6520640Z     getattr(self, test_name)()
2025-12-04T09:25:19.6521696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6522477Z     fn()
2025-12-04T09:25:19.6523116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6523876Z     method(*args, **kwargs)
2025-12-04T09:25:19.6524594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6525343Z     method(*args, **kwargs)
2025-12-04T09:25:19.6526061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6526816Z     with policy():
2025-12-04T09:25:19.6527505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6528265Z     raise RuntimeError(msg)
2025-12-04T09:25:19.6529870Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.6531402Z 
2025-12-04T09:25:19.6531620Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6532920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6534153Z 
2025-12-04T09:25:19.6534431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6534822Z 
2025-12-04T09:25:19.6534826Z 
2025-12-04T09:25:19.6535051Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.6535764Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.6537329Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml -
2025-12-04T09:25:19.6538581Z =========================== short test summary info ============================
2025-12-04T09:25:19.6539988Z FAILED [9.3991s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.6541347Z Traceback (most recent call last):
2025-12-04T09:25:19.6542146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6542962Z     getattr(self, test_name)()
2025-12-04T09:25:19.6543709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6544484Z     fn()
2025-12-04T09:25:19.6545137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6545887Z     method(*args, **kwargs)
2025-12-04T09:25:19.6546606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6547368Z     method(*args, **kwargs)
2025-12-04T09:25:19.6548082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6549030Z     with policy():
2025-12-04T09:25:19.6549677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6550488Z     raise RuntimeError(msg)
2025-12-04T09:25:19.6551976Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.6553402Z 
2025-12-04T09:25:19.6553606Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6554826Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6555845Z 
2025-12-04T09:25:19.6556095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6556653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.6557099Z ============================== 1 failed in 9.42s ===============================
2025-12-04T09:25:19.6557480Z Got exit code 1
2025-12-04T09:25:19.6557736Z Retrying single test...
2025-12-04T09:25:19.6558620Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml
2025-12-04T09:25:19.6559620Z ============================= test session starts ==============================
2025-12-04T09:25:19.6560240Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.6560811Z cachedir: .pytest_cache
2025-12-04T09:25:19.6561469Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.6562235Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.6562572Z configfile: pytest.ini
2025-12-04T09:25:19.6563250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.6564126Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.6565411Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6566598Z Running 1 items in this shard
2025-12-04T09:25:19.6566796Z 
2025-12-04T09:25:19.6568020Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:19:53.214000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27835
2025-12-04T09:25:19.6569864Z I1204 09:19:53.215000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27836
2025-12-04T09:25:19.6570878Z I1204 09:19:53.216000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27837
2025-12-04T09:25:19.6571888Z I1204 09:19:53.216000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27838
2025-12-04T09:25:19.6574607Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6577258Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6579928Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6582567Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6585184Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6587814Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6590333Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6592677Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6596882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6601385Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6605858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6610292Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6614837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6619847Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6625239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6630315Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6631357Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6632452Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6634105Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6635515Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6636946Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6638270Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6639568Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6640936Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6642299Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6643668Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6645108Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6646442Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6647767Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6649137Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6651294Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 170852352 and is now 617545728.
2025-12-04T09:25:19.6653327Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6654324Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6656282Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6658297Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6659473Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6660868Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.6661952Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6663016Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6664642Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6666246Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6667841Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6669351Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6670636Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6672007Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6673376Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6674789Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6676150Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6677477Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6678816Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6680192Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6682342Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.6684352Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6685349Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6687252Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6688917Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6689962Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6691178Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.6692144Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6693097Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6694544Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6695954Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6697714Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6699197Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6700653Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6702198Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6703807Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6705356Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6706899Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6708405Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6709893Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6711256Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6713411Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328.
2025-12-04T09:25:19.6715452Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6716442Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6718378Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6719992Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6721372Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6722734Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.6723824Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6724895Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6726535Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6728134Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6729742Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6731223Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6732667Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6734335Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6735800Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6737478Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6739022Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6740504Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6742019Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6743586Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6746016Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424.
2025-12-04T09:25:19.6748309Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6749488Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6751651Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6753334Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6754378Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6755591Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.6756263Z FAILED [9.5266s] [100%]
2025-12-04T09:25:19.6756441Z 
2025-12-04T09:25:19.6756582Z =================================== FAILURES ===================================
2025-12-04T09:25:19.6757289Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.6757956Z Traceback (most recent call last):
2025-12-04T09:25:19.6758670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.6759390Z     self._join_processes(fn)
2025-12-04T09:25:19.6760113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.6760884Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.6761683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.6762467Z     raise RuntimeError(error)
2025-12-04T09:25:19.6762863Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.6763307Z Traceback (most recent call last):
2025-12-04T09:25:19.6764010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6764780Z     getattr(self, test_name)()
2025-12-04T09:25:19.6765451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6766140Z     fn()
2025-12-04T09:25:19.6766720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6767383Z     method(*args, **kwargs)
2025-12-04T09:25:19.6768028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6768705Z     method(*args, **kwargs)
2025-12-04T09:25:19.6769340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6770004Z     with policy():
2025-12-04T09:25:19.6770614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6771304Z     raise RuntimeError(msg)
2025-12-04T09:25:19.6772726Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.6774063Z 
2025-12-04T09:25:19.6774261Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6775420Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6776481Z 
2025-12-04T09:25:19.6776906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6777310Z 
2025-12-04T09:25:19.6777315Z 
2025-12-04T09:25:19.6777556Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.6778234Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.6779570Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml -
2025-12-04T09:25:19.6780820Z =========================== short test summary info ============================
2025-12-04T09:25:19.6782224Z FAILED [9.5266s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.6783571Z Traceback (most recent call last):
2025-12-04T09:25:19.6784357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6785161Z     getattr(self, test_name)()
2025-12-04T09:25:19.6785923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6786689Z     fn()
2025-12-04T09:25:19.6787338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6788104Z     method(*args, **kwargs)
2025-12-04T09:25:19.6788926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6789593Z     method(*args, **kwargs)
2025-12-04T09:25:19.6790231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6790902Z     with policy():
2025-12-04T09:25:19.6791501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6792191Z     raise RuntimeError(msg)
2025-12-04T09:25:19.6793658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.6794999Z 
2025-12-04T09:25:19.6795205Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6796356Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6797304Z 
2025-12-04T09:25:19.6797540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6798068Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.6798524Z ======================= 1 failed, 7 deselected in 9.55s ========================
2025-12-04T09:25:19.6798909Z Got exit code 1
2025-12-04T09:25:19.6799141Z Retrying single test...
2025-12-04T09:25:19.6799985Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml
2025-12-04T09:25:19.6800930Z ============================= test session starts ==============================
2025-12-04T09:25:19.6801510Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.6802045Z cachedir: .pytest_cache
2025-12-04T09:25:19.6802679Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.6803409Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.6803715Z configfile: pytest.ini
2025-12-04T09:25:19.6804366Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.6805186Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.6806387Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6807507Z Running 1 items in this shard
2025-12-04T09:25:19.6807705Z 
2025-12-04T09:25:19.6808850Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:20:07.394000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 28228
2025-12-04T09:25:19.6810557Z I1204 09:20:07.395000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 28229
2025-12-04T09:25:19.6811574Z I1204 09:20:07.396000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 28230
2025-12-04T09:25:19.6812567Z I1204 09:20:07.396000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 28231
2025-12-04T09:25:19.6815283Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6818029Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6821422Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6824104Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6826713Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6829335Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6831940Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.6834788Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.6839220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6843998Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6848766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6853476Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6858632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6863649Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6868677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.6873277Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.6874172Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6875125Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6876570Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6878184Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6879691Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6881071Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6882444Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6883899Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6885355Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6886803Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6888289Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6889798Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6891145Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6892518Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6894675Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.6916387Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6917456Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6919482Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6921673Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6922867Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6924304Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.6925370Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6926425Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6928048Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6929638Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6931250Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6932728Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6934325Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6935695Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6937365Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6939016Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6940566Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6942074Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6943577Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6945133Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6947569Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328.
2025-12-04T09:25:19.6950095Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6951187Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6953269Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6955086Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6956238Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6957609Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.6958668Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6959700Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6961285Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6962845Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6964504Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6965970Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6967256Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6968623Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6970001Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.6971417Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.6972961Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.6974367Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.6975791Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.6977542Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.6979979Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.6982271Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6983381Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.6985535Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.6987425Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.6988734Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.6990050Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.6991022Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.6991982Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.6993431Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.6994855Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.6996264Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.6997580Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.6998876Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7000243Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7001696Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7003067Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7004438Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7005763Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7007107Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7008475Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7010632Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 481230848 and is now 613351424.
2025-12-04T09:25:19.7012657Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7013654Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7015585Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7017600Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7018780Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7020135Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7021081Z FAILED [9.5127s] [100%]
2025-12-04T09:25:19.7021263Z 
2025-12-04T09:25:19.7021417Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7022219Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.7022987Z Traceback (most recent call last):
2025-12-04T09:25:19.7023792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7024588Z     self._join_processes(fn)
2025-12-04T09:25:19.7025399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7026280Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7027170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7028026Z     raise RuntimeError(error)
2025-12-04T09:25:19.7028485Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7028992Z Traceback (most recent call last):
2025-12-04T09:25:19.7029772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7030582Z     getattr(self, test_name)()
2025-12-04T09:25:19.7031453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7032244Z     fn()
2025-12-04T09:25:19.7032990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7033712Z     method(*args, **kwargs)
2025-12-04T09:25:19.7034388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7035096Z     method(*args, **kwargs)
2025-12-04T09:25:19.7035783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7036502Z     with policy():
2025-12-04T09:25:19.7037153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7037871Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7039377Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.7040817Z 
2025-12-04T09:25:19.7041026Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7042250Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7043293Z 
2025-12-04T09:25:19.7043550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7043943Z 
2025-12-04T09:25:19.7043949Z 
2025-12-04T09:25:19.7044166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7044768Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7046081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml -
2025-12-04T09:25:19.7047249Z =========================== short test summary info ============================
2025-12-04T09:25:19.7048582Z FAILED [9.5127s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7049848Z Traceback (most recent call last):
2025-12-04T09:25:19.7050599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7051344Z     getattr(self, test_name)()
2025-12-04T09:25:19.7052069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7052807Z     fn()
2025-12-04T09:25:19.7053410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7054130Z     method(*args, **kwargs)
2025-12-04T09:25:19.7054895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7055573Z     method(*args, **kwargs)
2025-12-04T09:25:19.7056279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7057173Z     with policy():
2025-12-04T09:25:19.7057909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7058686Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7060345Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.7061890Z 
2025-12-04T09:25:19.7062109Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7063412Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7064485Z 
2025-12-04T09:25:19.7064771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7065350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.7065858Z ======================= 1 failed, 7 deselected in 9.53s ========================
2025-12-04T09:25:19.7066289Z Got exit code 1
2025-12-04T09:25:19.7067321Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7068796Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.7069963Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml
2025-12-04T09:25:19.7070908Z ============================= test session starts ==============================
2025-12-04T09:25:19.7071550Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.7072074Z cachedir: .pytest_cache
2025-12-04T09:25:19.7072709Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.7073411Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.7073747Z configfile: pytest.ini
2025-12-04T09:25:19.7074394Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.7075179Z collecting ... collected 8 items / 1 deselected / 7 selected
2025-12-04T09:25:19.7075612Z stepcurrent: skipping 1 already run items.
2025-12-04T09:25:19.7075946Z Running 7 items in this shard
2025-12-04T09:25:19.7076144Z 
2025-12-04T09:25:19.7077292Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:21.564000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 28621
2025-12-04T09:25:19.7078987Z I1204 09:20:21.564000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 28622
2025-12-04T09:25:19.7080006Z I1204 09:20:21.565000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 28623
2025-12-04T09:25:19.7081002Z I1204 09:20:21.566000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 28624
2025-12-04T09:25:19.7083729Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7086071Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7088442Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7090797Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7093101Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7095441Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7098140Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7100777Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7105513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7110504Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7114980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7119438Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7124689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7129712Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7134730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7139776Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7140796Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7141867Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7143498Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7145097Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7146696Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7148159Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7149625Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7150995Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7152370Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7153739Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7155141Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7156474Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7157815Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7159195Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7161344Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.7163355Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7164358Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7166256Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7167914Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7168967Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7170184Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.7171154Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7172108Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7173554Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7174964Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7176463Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7178106Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7179547Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7181080Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7182613Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7184212Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7185752Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7187225Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7188717Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7190192Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7192324Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728.
2025-12-04T09:25:19.7194336Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7195312Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7197196Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7198845Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7199911Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7201103Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7202056Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7203001Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7204442Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7205853Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7207253Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7208551Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7209827Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7211191Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7212601Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7213952Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7215309Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7216868Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7218368Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7219901Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7222548Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424.
2025-12-04T09:25:19.7224834Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7225946Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7228146Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7230005Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7231161Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7232605Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.7233694Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7234636Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7236068Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7237473Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7239070Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7240445Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7241800Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7243233Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7244744Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7246179Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7247796Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7249239Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7250682Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7252187Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7254531Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328.
2025-12-04T09:25:19.7256994Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7258101Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7260273Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7262128Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7263291Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7264630Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.7265382Z FAILED [9.0335s] [ 14%]
2025-12-04T09:25:19.7265569Z 
2025-12-04T09:25:19.7265716Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7266487Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.7267226Z Traceback (most recent call last):
2025-12-04T09:25:19.7268012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7269028Z     self._join_processes(fn)
2025-12-04T09:25:19.7269901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7270712Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7271530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7272336Z     raise RuntimeError(error)
2025-12-04T09:25:19.7272757Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7273208Z Traceback (most recent call last):
2025-12-04T09:25:19.7273939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7274734Z     getattr(self, test_name)()
2025-12-04T09:25:19.7275445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7276155Z     fn()
2025-12-04T09:25:19.7276756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7277461Z     method(*args, **kwargs)
2025-12-04T09:25:19.7278197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7278858Z     method(*args, **kwargs)
2025-12-04T09:25:19.7279485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7280135Z     with policy():
2025-12-04T09:25:19.7280721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7281390Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7282792Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.7284122Z 
2025-12-04T09:25:19.7284323Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7285456Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7286430Z 
2025-12-04T09:25:19.7286669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7287026Z 
2025-12-04T09:25:19.7287030Z 
2025-12-04T09:25:19.7287229Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7287804Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7288976Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml -
2025-12-04T09:25:19.7290078Z =========================== short test summary info ============================
2025-12-04T09:25:19.7291313Z FAILED [9.0335s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7292478Z Traceback (most recent call last):
2025-12-04T09:25:19.7293163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7293865Z     getattr(self, test_name)()
2025-12-04T09:25:19.7294526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7295208Z     fn()
2025-12-04T09:25:19.7295770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7296505Z     method(*args, **kwargs)
2025-12-04T09:25:19.7297374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7298122Z     method(*args, **kwargs)
2025-12-04T09:25:19.7298821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7299556Z     with policy():
2025-12-04T09:25:19.7300234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7301066Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7302646Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728.
2025-12-04T09:25:19.7304154Z 
2025-12-04T09:25:19.7304369Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7305656Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7306720Z 
2025-12-04T09:25:19.7306998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7307571Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.7308097Z ======================= 1 failed, 1 deselected in 9.06s ========================
2025-12-04T09:25:19.7308528Z Got exit code 1
2025-12-04T09:25:19.7308897Z Retrying single test...
2025-12-04T09:25:19.7309750Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml
2025-12-04T09:25:19.7311394Z ============================= test session starts ==============================
2025-12-04T09:25:19.7311986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.7312509Z cachedir: .pytest_cache
2025-12-04T09:25:19.7313140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.7313864Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.7314178Z configfile: pytest.ini
2025-12-04T09:25:19.7314818Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.7315636Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.7316840Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7317949Z Running 1 items in this shard
2025-12-04T09:25:19.7318149Z 
2025-12-04T09:25:19.7319288Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:35.634000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29014
2025-12-04T09:25:19.7321128Z I1204 09:20:35.635000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29015
2025-12-04T09:25:19.7322447Z I1204 09:20:35.636000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29016
2025-12-04T09:25:19.7323585Z I1204 09:20:35.636000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29017
2025-12-04T09:25:19.7326652Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7329315Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7332029Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7334645Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7337509Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7340148Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7342779Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7345410Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7350212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7354744Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7359231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7363699Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7368269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7372693Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7377493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7382596Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7383596Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7384671Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7386311Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7387913Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7389542Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7390858Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7392139Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7393517Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7394891Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7396258Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7397676Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7399000Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7400341Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7401710Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7403855Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.7405872Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7406854Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7408747Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7410408Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7411455Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7412705Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.7413663Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7414620Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7416069Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7417856Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7419448Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7421118Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7422579Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7424129Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7425684Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7427306Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7428851Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7430342Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7431847Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7433428Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7435576Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.7437609Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7438602Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7440495Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7442153Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7443182Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7444407Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.7445366Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7446322Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7447745Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7449154Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7450574Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7451878Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7453163Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7454505Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7455874Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7457620Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7459160Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7460634Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7462126Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7463661Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7466081Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.7468356Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7469460Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7471351Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7473000Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7474060Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7475248Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7476192Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7477140Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7478584Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7480001Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7481413Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7482700Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7483984Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7485339Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7486808Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7488153Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7489516Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7490835Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7492161Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7493525Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7495646Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7498022Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7499120Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7501288Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7503154Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7504303Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7505650Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.7506402Z FAILED [9.9545s] [100%]
2025-12-04T09:25:19.7506578Z 
2025-12-04T09:25:19.7506743Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7507507Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.7508252Z Traceback (most recent call last):
2025-12-04T09:25:19.7509139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7509905Z     self._join_processes(fn)
2025-12-04T09:25:19.7510666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7511500Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7512353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7513181Z     raise RuntimeError(error)
2025-12-04T09:25:19.7513606Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7514078Z Traceback (most recent call last):
2025-12-04T09:25:19.7514832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7515590Z     getattr(self, test_name)()
2025-12-04T09:25:19.7516367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7517112Z     fn()
2025-12-04T09:25:19.7517736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7518678Z     method(*args, **kwargs)
2025-12-04T09:25:19.7519311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7520171Z     method(*args, **kwargs)
2025-12-04T09:25:19.7520964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7521876Z     with policy():
2025-12-04T09:25:19.7522559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7523321Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7524897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7526398Z 
2025-12-04T09:25:19.7526614Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7527882Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7529020Z 
2025-12-04T09:25:19.7529298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7529697Z 
2025-12-04T09:25:19.7529702Z 
2025-12-04T09:25:19.7529937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7530557Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7531934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml -
2025-12-04T09:25:19.7533282Z =========================== short test summary info ============================
2025-12-04T09:25:19.7534647Z FAILED [9.9545s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.7535921Z Traceback (most recent call last):
2025-12-04T09:25:19.7536927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7537729Z     getattr(self, test_name)()
2025-12-04T09:25:19.7538479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7539244Z     fn()
2025-12-04T09:25:19.7539885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7540644Z     method(*args, **kwargs)
2025-12-04T09:25:19.7541342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7542088Z     method(*args, **kwargs)
2025-12-04T09:25:19.7542788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7543527Z     with policy():
2025-12-04T09:25:19.7544203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7544959Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7546627Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7548245Z 
2025-12-04T09:25:19.7548462Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7549725Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7550677Z 
2025-12-04T09:25:19.7550916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7551438Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.7551881Z ======================= 1 failed, 7 deselected in 9.98s ========================
2025-12-04T09:25:19.7552248Z Got exit code 1
2025-12-04T09:25:19.7552486Z Retrying single test...
2025-12-04T09:25:19.7553327Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml
2025-12-04T09:25:19.7554258Z ============================= test session starts ==============================
2025-12-04T09:25:19.7554842Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.7555369Z cachedir: .pytest_cache
2025-12-04T09:25:19.7555991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.7556699Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.7557005Z configfile: pytest.ini
2025-12-04T09:25:19.7557644Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.7558414Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.7559641Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7560760Z Running 1 items in this shard
2025-12-04T09:25:19.7560942Z 
2025-12-04T09:25:19.7562089Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:50.194000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29407
2025-12-04T09:25:19.7563794Z I1204 09:20:50.195000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29408
2025-12-04T09:25:19.7564790Z I1204 09:20:50.195000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29409
2025-12-04T09:25:19.7565795Z I1204 09:20:50.196000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29410
2025-12-04T09:25:19.7568502Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7570848Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7573225Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7575545Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7578237Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7580878Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7583471Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7586101Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7590788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7595281Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7599951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7604651Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7609421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7614033Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7618997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:25:19.7624191Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.7625678Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7626750Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7628375Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7629961Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7631544Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7633202Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7634488Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7635857Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7637214Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7638574Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7640020Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7641350Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7642684Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7644044Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7646192Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 734986240.
2025-12-04T09:25:19.7648207Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7649181Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7651075Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7652724Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7653759Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7654966Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.7655958Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7657287Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7658891Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7660480Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7662066Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7663539Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7664981Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7666509Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7668052Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7669593Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7671001Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7672313Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7673646Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7675015Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7677161Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.7679170Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7680146Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7682037Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7683682Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7684720Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7685939Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7686882Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7687829Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7689262Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7690674Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7692073Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7693376Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7694653Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7696008Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7697705Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7699283Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7700826Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7702321Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7703823Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7705360Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7707758Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.7710001Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7710974Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7712858Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7714508Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7715548Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7716734Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.7717685Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7718626Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7720050Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7721824Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7723400Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7724855Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7726294Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7727813Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7729433Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7730952Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7732475Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7734145Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7735749Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7737499Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7739918Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7742209Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7743324Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7745496Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7747364Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7748666Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7750071Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.7750769Z FAILED [9.3034s] [100%]
2025-12-04T09:25:19.7750940Z 
2025-12-04T09:25:19.7751079Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7751809Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.7752498Z Traceback (most recent call last):
2025-12-04T09:25:19.7753225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7753964Z     self._join_processes(fn)
2025-12-04T09:25:19.7754704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7755504Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7756313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7757112Z     raise RuntimeError(error)
2025-12-04T09:25:19.7757529Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.7757974Z Traceback (most recent call last):
2025-12-04T09:25:19.7758692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7759428Z     getattr(self, test_name)()
2025-12-04T09:25:19.7760194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7760910Z     fn()
2025-12-04T09:25:19.7761508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7762207Z     method(*args, **kwargs)
2025-12-04T09:25:19.7762861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7763566Z     method(*args, **kwargs)
2025-12-04T09:25:19.7764229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7764929Z     with policy():
2025-12-04T09:25:19.7765547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7766258Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7767746Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.7769212Z 
2025-12-04T09:25:19.7769411Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7770532Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7771475Z 
2025-12-04T09:25:19.7771738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7772094Z 
2025-12-04T09:25:19.7772098Z 
2025-12-04T09:25:19.7772294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7772842Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7774045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml -
2025-12-04T09:25:19.7775133Z =========================== short test summary info ============================
2025-12-04T09:25:19.7776452Z FAILED [9.3034s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.7777930Z Traceback (most recent call last):
2025-12-04T09:25:19.7778709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7779503Z     getattr(self, test_name)()
2025-12-04T09:25:19.7780254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7781015Z     fn()
2025-12-04T09:25:19.7781642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7782402Z     method(*args, **kwargs)
2025-12-04T09:25:19.7783105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7783846Z     method(*args, **kwargs)
2025-12-04T09:25:19.7784536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7785274Z     with policy():
2025-12-04T09:25:19.7785950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7786700Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7788329Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.7789828Z 
2025-12-04T09:25:19.7790019Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7791154Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7792095Z 
2025-12-04T09:25:19.7792331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7792840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.7793278Z ======================= 1 failed, 7 deselected in 9.32s ========================
2025-12-04T09:25:19.7793639Z Got exit code 1
2025-12-04T09:25:19.7794526Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.7795759Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.7796910Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml
2025-12-04T09:25:19.7797837Z ============================= test session starts ==============================
2025-12-04T09:25:19.7798403Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.7798953Z cachedir: .pytest_cache
2025-12-04T09:25:19.7799576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.7800261Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.7800588Z configfile: pytest.ini
2025-12-04T09:25:19.7801217Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.7801399Z collecting ... collected 8 items / 2 deselected / 6 selected
2025-12-04T09:25:19.7801526Z stepcurrent: skipping 2 already run items.
2025-12-04T09:25:19.7801625Z Running 6 items in this shard
2025-12-04T09:25:19.7801631Z 
2025-12-04T09:25:19.7802790Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:04.174000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29800
2025-12-04T09:25:19.7803233Z I1204 09:21:04.174000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29801
2025-12-04T09:25:19.7803678Z I1204 09:21:04.175000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29802
2025-12-04T09:25:19.7804125Z I1204 09:21:04.176000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29803
2025-12-04T09:25:19.7806260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7806367Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7808542Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7808648Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7810768Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7810874Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7812984Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7813094Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7814629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7814784Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7816405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7816516Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7818393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7818517Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7820233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7820357Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7820964Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7821475Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7822457Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7823050Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7824019Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7824405Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7825343Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7825815Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7826758Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7827219Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7828164Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7828584Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7829574Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7830044Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7831924Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 636420096 and is now 722403328.
2025-12-04T09:25:19.7832263Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7832982Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7834279Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7834604Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7835263Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7835752Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.7836162Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7836637Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7837610Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7838078Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7838980Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7839333Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7840209Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7840647Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7841532Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7841961Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7842849Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7843273Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7844334Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7844826Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7846581Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.7846920Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7847533Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7848872Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7849197Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7849875Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7850377Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.7850788Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7851284Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7852274Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7852755Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7853686Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7854054Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7855044Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7855477Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7856438Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7857058Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7858009Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7858466Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7859416Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7859909Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7861714Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.7862069Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7862708Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7864085Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7864425Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7865120Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7865642Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.7866067Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7866638Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7867616Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7868106Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7869242Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7869582Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7870417Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7870827Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7871666Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7872074Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7872938Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7873310Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7874219Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7874639Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7876248Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424.
2025-12-04T09:25:19.7876563Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7877130Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7878356Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7878656Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7879280Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7879738Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7879831Z FAILED [9.3894s] [ 16%]
2025-12-04T09:25:19.7879885Z 
2025-12-04T09:25:19.7880031Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7880467Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.7880588Z Traceback (most recent call last):
2025-12-04T09:25:19.7881076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7881177Z     self._join_processes(fn)
2025-12-04T09:25:19.7881707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7881836Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7882378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7882490Z     raise RuntimeError(error)
2025-12-04T09:25:19.7882704Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.7882818Z Traceback (most recent call last):
2025-12-04T09:25:19.7883302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7883402Z     getattr(self, test_name)()
2025-12-04T09:25:19.7883891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7883973Z     fn()
2025-12-04T09:25:19.7884428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7884568Z     method(*args, **kwargs)
2025-12-04T09:25:19.7885020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7885120Z     method(*args, **kwargs)
2025-12-04T09:25:19.7885570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7885684Z     with policy():
2025-12-04T09:25:19.7886145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7886241Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7887482Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424.
2025-12-04T09:25:19.7887491Z 
2025-12-04T09:25:19.7887686Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7888526Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7888533Z 
2025-12-04T09:25:19.7888777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7888782Z 
2025-12-04T09:25:19.7888786Z 
2025-12-04T09:25:19.7888983Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7889225Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7890055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml -
2025-12-04T09:25:19.7890221Z =========================== short test summary info ============================
2025-12-04T09:25:19.7891236Z FAILED [9.3894s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.7891345Z Traceback (most recent call last):
2025-12-04T09:25:19.7891843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7891944Z     getattr(self, test_name)()
2025-12-04T09:25:19.7892422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7892512Z     fn()
2025-12-04T09:25:19.7892967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7893077Z     method(*args, **kwargs)
2025-12-04T09:25:19.7893525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7893617Z     method(*args, **kwargs)
2025-12-04T09:25:19.7894079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7894167Z     with policy():
2025-12-04T09:25:19.7894633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7894727Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7895955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424.
2025-12-04T09:25:19.7895986Z 
2025-12-04T09:25:19.7896260Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7897330Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7897372Z 
2025-12-04T09:25:19.7897653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7897833Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.7898012Z ======================= 1 failed, 2 deselected in 9.41s ========================
2025-12-04T09:25:19.7898121Z Got exit code 1
2025-12-04T09:25:19.7898229Z Retrying single test...
2025-12-04T09:25:19.7899001Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml
2025-12-04T09:25:19.7899163Z ============================= test session starts ==============================
2025-12-04T09:25:19.7899516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.7899631Z cachedir: .pytest_cache
2025-12-04T09:25:19.7900153Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.7900279Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.7900394Z configfile: pytest.ini
2025-12-04T09:25:19.7900935Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.7901148Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.7902171Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7902286Z Running 1 items in this shard
2025-12-04T09:25:19.7902292Z 
2025-12-04T09:25:19.7903651Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:18.194000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30193
2025-12-04T09:25:19.7904160Z I1204 09:21:18.195000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30194
2025-12-04T09:25:19.7904661Z I1204 09:21:18.195000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30195
2025-12-04T09:25:19.7905152Z I1204 09:21:18.196000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30196
2025-12-04T09:25:19.7907592Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7907714Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7910075Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7910207Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7912344Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7912488Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7914607Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.7914725Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.7916273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7916410Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7917947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7918080Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7919656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7919787Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7921636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.7921770Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.7922225Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7922747Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7923751Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7924243Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7925235Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7925670Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7926615Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7927135Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7928068Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7928546Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7929482Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7929913Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7930871Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7931340Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7933180Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.7933631Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7934287Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7935508Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7935824Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7936496Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7937181Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.7937626Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7938139Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7939132Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7939619Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7940584Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7941004Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7941946Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7942450Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7943386Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7943859Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7944792Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7945219Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7946176Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7946644Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7948571Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.7949065Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7949644Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7950858Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7951160Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7951791Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7952258Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.7952652Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7953106Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7953990Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7954595Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7955533Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7955902Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7956810Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7957255Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7958140Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7958587Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7959470Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7959870Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7960767Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7961208Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7962983Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7963308Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7963914Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7965207Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7965527Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7966299Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7966764Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.7967157Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.7967605Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.7968476Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7968945Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.7969808Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7970176Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.7971007Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7971429Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7972258Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7972673Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.7973518Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7973892Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.7974744Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7975161Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.7977129Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 506396672 and is now 613351424.
2025-12-04T09:25:19.7977479Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7978115Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7979505Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7979849Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.7980559Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7981082Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.7981203Z FAILED [9.3701s] [100%]
2025-12-04T09:25:19.7981209Z 
2025-12-04T09:25:19.7981359Z =================================== FAILURES ===================================
2025-12-04T09:25:19.7981853Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.7981988Z Traceback (most recent call last):
2025-12-04T09:25:19.7982568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.7982697Z     self._join_processes(fn)
2025-12-04T09:25:19.7983290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.7983464Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.7984086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.7984203Z     raise RuntimeError(error)
2025-12-04T09:25:19.7984441Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.7984576Z Traceback (most recent call last):
2025-12-04T09:25:19.7985123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7985251Z     getattr(self, test_name)()
2025-12-04T09:25:19.7985793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7985886Z     fn()
2025-12-04T09:25:19.7986404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7986515Z     method(*args, **kwargs)
2025-12-04T09:25:19.7987022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7987142Z     method(*args, **kwargs)
2025-12-04T09:25:19.7987651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7987764Z     with policy():
2025-12-04T09:25:19.7988276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7988389Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7989945Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7989954Z 
2025-12-04T09:25:19.7990150Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7990999Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7991004Z 
2025-12-04T09:25:19.7991244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7991249Z 
2025-12-04T09:25:19.7991253Z 
2025-12-04T09:25:19.7991466Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.7991704Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.7992538Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml -
2025-12-04T09:25:19.7992706Z =========================== short test summary info ============================
2025-12-04T09:25:19.7993677Z FAILED [9.3701s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.7993801Z Traceback (most recent call last):
2025-12-04T09:25:19.7994292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.7994393Z     getattr(self, test_name)()
2025-12-04T09:25:19.7994919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.7995002Z     fn()
2025-12-04T09:25:19.7995458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7995598Z     method(*args, **kwargs)
2025-12-04T09:25:19.7996049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.7996156Z     method(*args, **kwargs)
2025-12-04T09:25:19.7996607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.7996697Z     with policy():
2025-12-04T09:25:19.7997166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.7997266Z     raise RuntimeError(msg)
2025-12-04T09:25:19.7998514Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.7998521Z 
2025-12-04T09:25:19.7998715Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.7999554Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.7999571Z 
2025-12-04T09:25:19.7999810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.7999975Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8000150Z ======================= 1 failed, 7 deselected in 9.39s ========================
2025-12-04T09:25:19.8000244Z Got exit code 1
2025-12-04T09:25:19.8000341Z Retrying single test...
2025-12-04T09:25:19.8001085Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml
2025-12-04T09:25:19.8001238Z ============================= test session starts ==============================
2025-12-04T09:25:19.8001566Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8001668Z cachedir: .pytest_cache
2025-12-04T09:25:19.8002131Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8002256Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8002357Z configfile: pytest.ini
2025-12-04T09:25:19.8002839Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8003044Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8003963Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8004081Z Running 1 items in this shard
2025-12-04T09:25:19.8004086Z 
2025-12-04T09:25:19.8005235Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:32.303000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30586
2025-12-04T09:25:19.8005700Z I1204 09:21:32.304000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30587
2025-12-04T09:25:19.8006142Z I1204 09:21:32.305000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30588
2025-12-04T09:25:19.8006627Z I1204 09:21:32.306000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30589
2025-12-04T09:25:19.8018391Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8018680Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8021346Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8021479Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8023877Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8023998Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8026507Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8026630Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8028361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8028486Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8030213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8030337Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8032058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8032181Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8034024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8034187Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8034611Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8035113Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8036060Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8036541Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8037481Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8037851Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8038753Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8039197Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8040110Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8040556Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8041521Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8041934Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8042955Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8043389Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8045095Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 636420096 and is now 722403328.
2025-12-04T09:25:19.8045421Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8046015Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8047306Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8047650Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8048313Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8048948Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8049330Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8049786Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8050652Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8051280Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8052193Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8052551Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8053432Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8053862Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8054746Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8055229Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8056115Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8056747Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8057683Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8058157Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8059976Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8060327Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8060963Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8062335Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8062715Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8063441Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8063960Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8064385Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8064894Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8065869Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8066362Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8067329Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8067698Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8068633Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8069171Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8070059Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8070468Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8071302Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8071670Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8072497Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8072917Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8074522Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.8074831Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8075393Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8076644Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8076994Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8077605Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8078073Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8078448Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8078908Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8079770Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8080207Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8081060Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8081387Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8082223Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8082630Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8083515Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8083928Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8084762Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8085135Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8085961Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8086385Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8087986Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 504299520 and is now 613351424.
2025-12-04T09:25:19.8088289Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8089232Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8090456Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8090781Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8091396Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8091864Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8091955Z FAILED [9.3767s] [100%]
2025-12-04T09:25:19.8091964Z 
2025-12-04T09:25:19.8092105Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8092544Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.8092657Z Traceback (most recent call last):
2025-12-04T09:25:19.8093156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8093260Z     self._join_processes(fn)
2025-12-04T09:25:19.8093787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8093918Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8094454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8094563Z     raise RuntimeError(error)
2025-12-04T09:25:19.8094777Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.8094881Z Traceback (most recent call last):
2025-12-04T09:25:19.8095418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8095518Z     getattr(self, test_name)()
2025-12-04T09:25:19.8095998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8096077Z     fn()
2025-12-04T09:25:19.8096774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8096888Z     method(*args, **kwargs)
2025-12-04T09:25:19.8097397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8097498Z     method(*args, **kwargs)
2025-12-04T09:25:19.8098019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8098116Z     with policy():
2025-12-04T09:25:19.8098640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8098750Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8100139Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8100147Z 
2025-12-04T09:25:19.8100374Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8101323Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8101435Z 
2025-12-04T09:25:19.8101710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8101717Z 
2025-12-04T09:25:19.8101722Z 
2025-12-04T09:25:19.8101951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8102253Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8103193Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml -
2025-12-04T09:25:19.8103365Z =========================== short test summary info ============================
2025-12-04T09:25:19.8104475Z FAILED [9.3767s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.8104598Z Traceback (most recent call last):
2025-12-04T09:25:19.8105158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8105271Z     getattr(self, test_name)()
2025-12-04T09:25:19.8105810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8105906Z     fn()
2025-12-04T09:25:19.8106417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8106520Z     method(*args, **kwargs)
2025-12-04T09:25:19.8107034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8107138Z     method(*args, **kwargs)
2025-12-04T09:25:19.8107650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8107749Z     with policy():
2025-12-04T09:25:19.8108260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8108432Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8109807Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8109813Z 
2025-12-04T09:25:19.8110013Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8110843Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8110850Z 
2025-12-04T09:25:19.8111096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8111255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8111418Z ======================= 1 failed, 7 deselected in 9.40s ========================
2025-12-04T09:25:19.8111521Z Got exit code 1
2025-12-04T09:25:19.8112283Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8112646Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.8113329Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml
2025-12-04T09:25:19.8113500Z ============================= test session starts ==============================
2025-12-04T09:25:19.8113821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8113919Z cachedir: .pytest_cache
2025-12-04T09:25:19.8114378Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8114524Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8114619Z configfile: pytest.ini
2025-12-04T09:25:19.8115107Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8115292Z collecting ... collected 8 items / 3 deselected / 5 selected
2025-12-04T09:25:19.8115417Z stepcurrent: skipping 3 already run items.
2025-12-04T09:25:19.8115527Z Running 5 items in this shard
2025-12-04T09:25:19.8115531Z 
2025-12-04T09:25:19.8116679Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:21:46.414000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30979
2025-12-04T09:25:19.8117139Z I1204 09:21:46.415000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30980
2025-12-04T09:25:19.8117579Z I1204 09:21:46.416000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30981
2025-12-04T09:25:19.8118012Z I1204 09:21:46.417000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30982
2025-12-04T09:25:19.8120150Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8120254Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8122966Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8123086Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8125478Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8125593Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8127980Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8128087Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8129882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8130050Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8131780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8131917Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8133722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8133853Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8135460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8135581Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8136000Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8136540Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8137755Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8138244Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8139211Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8139589Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8140527Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8140999Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8141933Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8142391Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8143331Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8143777Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8144729Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8145222Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8147037Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728.
2025-12-04T09:25:19.8147375Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8148016Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8149512Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8149812Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8150431Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8150890Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8151278Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8151788Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8152656Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8153094Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8153947Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8154282Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8155108Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8155524Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8156352Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8156758Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8157589Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8157984Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8158827Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8159265Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8160877Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328.
2025-12-04T09:25:19.8161180Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8161747Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8162960Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8163256Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8163877Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8164340Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8164722Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8165215Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8166078Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8166513Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8167366Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8167706Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8168535Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8168951Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8169772Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8170176Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8171035Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8171408Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8172275Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8172687Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8174294Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424.
2025-12-04T09:25:19.8174598Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8175163Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8176438Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8176928Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8177623Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8178142Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8178637Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8179144Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8180119Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8180611Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8181579Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8181956Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8182889Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8183352Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8184291Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8184776Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8185718Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8186167Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8187112Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8187578Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8189422Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8189740Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8190297Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8191517Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8191811Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8192436Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8192942Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8193036Z FAILED [9.3888s] [ 20%]
2025-12-04T09:25:19.8193042Z 
2025-12-04T09:25:19.8193181Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8193613Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.8193734Z Traceback (most recent call last):
2025-12-04T09:25:19.8194218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8194322Z     self._join_processes(fn)
2025-12-04T09:25:19.8194860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8194984Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8195537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8195637Z     raise RuntimeError(error)
2025-12-04T09:25:19.8195843Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8195961Z Traceback (most recent call last):
2025-12-04T09:25:19.8196439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8196537Z     getattr(self, test_name)()
2025-12-04T09:25:19.8197016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8197124Z     fn()
2025-12-04T09:25:19.8197580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8197673Z     method(*args, **kwargs)
2025-12-04T09:25:19.8198124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8198252Z     method(*args, **kwargs)
2025-12-04T09:25:19.8198696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8198780Z     with policy():
2025-12-04T09:25:19.8199239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8199336Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8200562Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728.
2025-12-04T09:25:19.8200571Z 
2025-12-04T09:25:19.8200762Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8201597Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8201609Z 
2025-12-04T09:25:19.8201846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8201851Z 
2025-12-04T09:25:19.8201855Z 
2025-12-04T09:25:19.8202053Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8202290Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8203122Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml -
2025-12-04T09:25:19.8203282Z =========================== short test summary info ============================
2025-12-04T09:25:19.8204327Z FAILED [9.3888s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8204439Z Traceback (most recent call last):
2025-12-04T09:25:19.8204936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8205035Z     getattr(self, test_name)()
2025-12-04T09:25:19.8205524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8205603Z     fn()
2025-12-04T09:25:19.8206057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8206152Z     method(*args, **kwargs)
2025-12-04T09:25:19.8206606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8206699Z     method(*args, **kwargs)
2025-12-04T09:25:19.8207153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8207238Z     with policy():
2025-12-04T09:25:19.8207698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8207791Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8209024Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728.
2025-12-04T09:25:19.8209062Z 
2025-12-04T09:25:19.8209254Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8210085Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8210116Z 
2025-12-04T09:25:19.8210356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8210517Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8210673Z ======================= 1 failed, 3 deselected in 9.41s ========================
2025-12-04T09:25:19.8210762Z Got exit code 1
2025-12-04T09:25:19.8210854Z Retrying single test...
2025-12-04T09:25:19.8211532Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml
2025-12-04T09:25:19.8211679Z ============================= test session starts ==============================
2025-12-04T09:25:19.8211994Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8212096Z cachedir: .pytest_cache
2025-12-04T09:25:19.8212553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8212664Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8212758Z configfile: pytest.ini
2025-12-04T09:25:19.8213229Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8213421Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8214317Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8214417Z Running 1 items in this shard
2025-12-04T09:25:19.8214421Z 
2025-12-04T09:25:19.8215620Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:22:00.534000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31372
2025-12-04T09:25:19.8216066Z I1204 09:22:00.534000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31373
2025-12-04T09:25:19.8216747Z I1204 09:22:00.535000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31374
2025-12-04T09:25:19.8217240Z I1204 09:22:00.536000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31375
2025-12-04T09:25:19.8219651Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8219770Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8222355Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8222539Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8224944Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8225093Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8227500Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8227626Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8229376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8229512Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8231232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8231367Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8233235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8233360Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8234974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8235091Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8235508Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8235989Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8236909Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8237365Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8238272Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8238655Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8239531Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8240092Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8240921Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8241328Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8242164Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8242538Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8243378Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8243791Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8245393Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8245744Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8246303Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8247511Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8247808Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8248424Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8248882Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8249269Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8249711Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8250571Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8251002Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8251881Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8252215Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8253083Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8253490Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8254325Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8254735Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8255572Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8255944Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8257072Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8257537Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8259413Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328.
2025-12-04T09:25:19.8259758Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8260390Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8261757Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8262094Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8262795Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8263313Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8263738Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8264245Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8265218Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8265734Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8266702Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8267114Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8268047Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8268617Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8269584Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8269995Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8270829Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8271199Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8272034Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8272441Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8274092Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8274399Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8274960Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8276166Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8276465Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8277078Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8277540Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8277920Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8278370Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8279234Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8279691Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8280546Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8280900Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8281729Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8282137Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8282970Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8283378Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8284208Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8284578Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8285404Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8285824Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8287471Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8287776Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8288333Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8289538Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8289840Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8290458Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8290917Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8291006Z FAILED [9.4044s] [100%]
2025-12-04T09:25:19.8291012Z 
2025-12-04T09:25:19.8291145Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8291575Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.8291708Z Traceback (most recent call last):
2025-12-04T09:25:19.8292211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8292317Z     self._join_processes(fn)
2025-12-04T09:25:19.8292879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8293006Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8293544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8293650Z     raise RuntimeError(error)
2025-12-04T09:25:19.8293857Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8293969Z Traceback (most recent call last):
2025-12-04T09:25:19.8294450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8294551Z     getattr(self, test_name)()
2025-12-04T09:25:19.8295030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8295113Z     fn()
2025-12-04T09:25:19.8295562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8295664Z     method(*args, **kwargs)
2025-12-04T09:25:19.8296111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8296275Z     method(*args, **kwargs)
2025-12-04T09:25:19.8296914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8297011Z     with policy():
2025-12-04T09:25:19.8297537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8297646Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8299092Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8299112Z 
2025-12-04T09:25:19.8299331Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8300263Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8300269Z 
2025-12-04T09:25:19.8300549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8300556Z 
2025-12-04T09:25:19.8300561Z 
2025-12-04T09:25:19.8300778Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8301052Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8301990Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml -
2025-12-04T09:25:19.8302159Z =========================== short test summary info ============================
2025-12-04T09:25:19.8303250Z FAILED [9.4044s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8303372Z Traceback (most recent call last):
2025-12-04T09:25:19.8303937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8304095Z     getattr(self, test_name)()
2025-12-04T09:25:19.8304632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8304734Z     fn()
2025-12-04T09:25:19.8305277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8305388Z     method(*args, **kwargs)
2025-12-04T09:25:19.8305896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8305997Z     method(*args, **kwargs)
2025-12-04T09:25:19.8306515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8306610Z     with policy():
2025-12-04T09:25:19.8307122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8307241Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8308732Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8308740Z 
2025-12-04T09:25:19.8308959Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8309870Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8309875Z 
2025-12-04T09:25:19.8310140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8310317Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8310489Z ======================= 1 failed, 7 deselected in 9.43s ========================
2025-12-04T09:25:19.8310588Z Got exit code 1
2025-12-04T09:25:19.8310685Z Retrying single test...
2025-12-04T09:25:19.8311525Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml
2025-12-04T09:25:19.8311695Z ============================= test session starts ==============================
2025-12-04T09:25:19.8312031Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8312143Z cachedir: .pytest_cache
2025-12-04T09:25:19.8312642Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8312759Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8312871Z configfile: pytest.ini
2025-12-04T09:25:19.8313387Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8313596Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8314580Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8314689Z Running 1 items in this shard
2025-12-04T09:25:19.8314694Z 
2025-12-04T09:25:19.8315944Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:22:14.674000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31765
2025-12-04T09:25:19.8316427Z I1204 09:22:14.674000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31766
2025-12-04T09:25:19.8316941Z I1204 09:22:14.675000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31767
2025-12-04T09:25:19.8317420Z I1204 09:22:14.676000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31768
2025-12-04T09:25:19.8319880Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8319985Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8322625Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8322750Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8325131Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8325251Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8327730Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8327849Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8329584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8329718Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8331441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8331577Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8333386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8333551Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8335214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8335374Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8335812Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8336392Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8337548Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8338040Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8339016Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8339409Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8340346Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8340830Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8341771Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8342310Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8343247Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8343669Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8344621Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8345090Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8346920Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.8347267Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8347919Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8349424Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8349772Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8350411Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8350874Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8351270Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8351720Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8352605Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8353050Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8353912Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8354261Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8355094Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8355524Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8356420Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8356848Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8357675Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8358050Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8358894Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8359314Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8360937Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 615448576.
2025-12-04T09:25:19.8361243Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8361821Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8363065Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8363394Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8364026Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8364488Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8364883Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8365337Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8366226Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8366661Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8367524Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8367870Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8368699Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8369122Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8369998Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8370410Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8371249Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8371623Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8372467Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8372884Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8374496Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8374797Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8375400Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8376874Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8377253Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8377965Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8378491Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8378930Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8379439Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8380426Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8380925Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8381888Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8382273Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8383219Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8383747Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8384686Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8385147Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8386094Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8386518Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8387473Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8387945Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8389760Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.8390093Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8390657Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8391880Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8392210Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8392843Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8393306Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8393417Z FAILED [9.4964s] [100%]
2025-12-04T09:25:19.8393423Z 
2025-12-04T09:25:19.8393557Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8393993Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.8394119Z Traceback (most recent call last):
2025-12-04T09:25:19.8394613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8394716Z     self._join_processes(fn)
2025-12-04T09:25:19.8395252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8395384Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8395938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8396046Z     raise RuntimeError(error)
2025-12-04T09:25:19.8396259Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8396381Z Traceback (most recent call last):
2025-12-04T09:25:19.8396912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8397030Z     getattr(self, test_name)()
2025-12-04T09:25:19.8397508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8397594Z     fn()
2025-12-04T09:25:19.8398059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8398156Z     method(*args, **kwargs)
2025-12-04T09:25:19.8398609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8398720Z     method(*args, **kwargs)
2025-12-04T09:25:19.8399175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8399278Z     with policy():
2025-12-04T09:25:19.8399739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8399842Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8401086Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.8401092Z 
2025-12-04T09:25:19.8401289Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8402168Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8402173Z 
2025-12-04T09:25:19.8402416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8402448Z 
2025-12-04T09:25:19.8402451Z 
2025-12-04T09:25:19.8402663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8402898Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8403735Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml -
2025-12-04T09:25:19.8403903Z =========================== short test summary info ============================
2025-12-04T09:25:19.8404875Z FAILED [9.4964s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8405001Z Traceback (most recent call last):
2025-12-04T09:25:19.8405497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8405601Z     getattr(self, test_name)()
2025-12-04T09:25:19.8406094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8406177Z     fn()
2025-12-04T09:25:19.8406792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8406909Z     method(*args, **kwargs)
2025-12-04T09:25:19.8407387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8407503Z     method(*args, **kwargs)
2025-12-04T09:25:19.8407978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8408071Z     with policy():
2025-12-04T09:25:19.8408638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8408747Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8410053Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.8410070Z 
2025-12-04T09:25:19.8410276Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8411158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8411166Z 
2025-12-04T09:25:19.8411429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8411605Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8411795Z ======================= 1 failed, 7 deselected in 9.52s ========================
2025-12-04T09:25:19.8411891Z Got exit code 1
2025-12-04T09:25:19.8412695Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8413097Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.8413812Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml
2025-12-04T09:25:19.8414008Z ============================= test session starts ==============================
2025-12-04T09:25:19.8414338Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8414451Z cachedir: .pytest_cache
2025-12-04T09:25:19.8414979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8415098Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8415201Z configfile: pytest.ini
2025-12-04T09:25:19.8415718Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8415914Z collecting ... collected 8 items / 4 deselected / 4 selected
2025-12-04T09:25:19.8416063Z stepcurrent: skipping 4 already run items.
2025-12-04T09:25:19.8416248Z Running 4 items in this shard
2025-12-04T09:25:19.8416257Z 
2025-12-04T09:25:19.8417739Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:28.804000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32158
2025-12-04T09:25:19.8418256Z I1204 09:22:28.804000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32159
2025-12-04T09:25:19.8418753Z I1204 09:22:28.805000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 32160
2025-12-04T09:25:19.8419261Z I1204 09:22:28.806000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 32161
2025-12-04T09:25:19.8421953Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8422095Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8424492Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8424622Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8427033Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8427164Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8429553Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8429711Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8431472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8431645Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8433525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8433645Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8435187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8435305Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8436844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8436961Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8439470Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8439580Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8441701Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8441817Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8443935Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8444051Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8446174Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8446322Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8447862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8448034Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8449565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8449729Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8451248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8451396Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8452920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8453066Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8453852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8453960Z   local_shape = tensor.shape
2025-12-04T09:25:19.8454688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8454785Z   local_shape = tensor.shape
2025-12-04T09:25:19.8455498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8455606Z   local_shape = tensor.shape
2025-12-04T09:25:19.8456386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8456500Z   local_shape = tensor.shape
2025-12-04T09:25:19.8457457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8457552Z   tensor.shape,
2025-12-04T09:25:19.8458358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8458452Z   tensor.shape,
2025-12-04T09:25:19.8459259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8459387Z   tensor.shape,
2025-12-04T09:25:19.8460185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8460286Z   tensor.dtype,
2025-12-04T09:25:19.8461127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8461219Z   tensor.dtype,
2025-12-04T09:25:19.8462023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8462114Z   tensor.shape,
2025-12-04T09:25:19.8462926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8463024Z   tensor.dtype,
2025-12-04T09:25:19.8463821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8463923Z   tensor.dtype,
2025-12-04T09:25:19.8464356Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8464868Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8465844Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8466324Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8467302Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8467727Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8468668Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8469191Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8470027Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8470434Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8471259Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8471632Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8472456Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8472873Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8474509Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088.
2025-12-04T09:25:19.8474869Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8475429Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8476668Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8476974Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8477582Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8478048Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8478423Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8478872Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8479740Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8480163Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8481071Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8481401Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8482231Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8482637Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8483463Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8483875Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8484697Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8485075Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8485903Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8486320Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8487992Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 619642880.
2025-12-04T09:25:19.8488324Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8488882Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8490115Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8490417Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8491033Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8491497Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8491867Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8492308Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8493178Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8493602Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8494507Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8494834Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8495664Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8496072Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8497226Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8497699Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8498626Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8499049Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8499982Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8500485Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8502328Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8502692Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8503328Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8504722Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8505071Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8505761Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8506286Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8506705Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8507202Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8508184Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8508714Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8509682Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8510008Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8510838Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8511243Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8512072Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8512481Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8513306Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8513684Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8514552Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8514964Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8516638Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 481230848 and is now 615448576.
2025-12-04T09:25:19.8516938Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8517501Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8518745Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8519048Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8519655Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8520116Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8520203Z FAILED [9.8827s] [ 25%]
2025-12-04T09:25:19.8520209Z 
2025-12-04T09:25:19.8520341Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8520933Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.8521041Z Traceback (most recent call last):
2025-12-04T09:25:19.8521834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8521956Z     self._join_processes(fn)
2025-12-04T09:25:19.8522544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8522694Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8523300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8523413Z     raise RuntimeError(error)
2025-12-04T09:25:19.8523650Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8523769Z Traceback (most recent call last):
2025-12-04T09:25:19.8524312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8524426Z     getattr(self, test_name)()
2025-12-04T09:25:19.8524961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8525052Z     fn()
2025-12-04T09:25:19.8525557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8525661Z     method(*args, **kwargs)
2025-12-04T09:25:19.8526177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8526278Z     method(*args, **kwargs)
2025-12-04T09:25:19.8526788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8526920Z     with policy():
2025-12-04T09:25:19.8527427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8527543Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8528996Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088.
2025-12-04T09:25:19.8529003Z 
2025-12-04T09:25:19.8529228Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8530199Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8530208Z 
2025-12-04T09:25:19.8530475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8530487Z 
2025-12-04T09:25:19.8530491Z 
2025-12-04T09:25:19.8530712Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8530972Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8531906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml -
2025-12-04T09:25:19.8532072Z =========================== short test summary info ============================
2025-12-04T09:25:19.8533206Z FAILED [9.8827s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8533327Z Traceback (most recent call last):
2025-12-04T09:25:19.8533942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8534043Z     getattr(self, test_name)()
2025-12-04T09:25:19.8534622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8534700Z     fn()
2025-12-04T09:25:19.8535150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8535239Z     method(*args, **kwargs)
2025-12-04T09:25:19.8535696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8535785Z     method(*args, **kwargs)
2025-12-04T09:25:19.8536303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8536399Z     with policy():
2025-12-04T09:25:19.8537053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8537159Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8538601Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088.
2025-12-04T09:25:19.8538609Z 
2025-12-04T09:25:19.8538821Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8539796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8539837Z 
2025-12-04T09:25:19.8540101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8540285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8540464Z ======================= 1 failed, 4 deselected in 9.90s ========================
2025-12-04T09:25:19.8540587Z Got exit code 1
2025-12-04T09:25:19.8540695Z Retrying single test...
2025-12-04T09:25:19.8541454Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml
2025-12-04T09:25:19.8541622Z ============================= test session starts ==============================
2025-12-04T09:25:19.8541969Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8542070Z cachedir: .pytest_cache
2025-12-04T09:25:19.8542591Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8542713Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8542817Z configfile: pytest.ini
2025-12-04T09:25:19.8543361Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8543565Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8544620Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8544728Z Running 1 items in this shard
2025-12-04T09:25:19.8544733Z 
2025-12-04T09:25:19.8546057Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:43.434000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32611
2025-12-04T09:25:19.8546569Z I1204 09:22:43.435000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32612
2025-12-04T09:25:19.8547114Z I1204 09:22:43.435000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 32613
2025-12-04T09:25:19.8547616Z I1204 09:22:43.436000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 32614
2025-12-04T09:25:19.8550056Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8550167Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8552448Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8552552Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8554665Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8554794Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8556901Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8557023Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8558568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8558684Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8560226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8560336Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8561866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8561977Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8563567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8563678Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8565812Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8565915Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8568027Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8568128Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8570229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8570381Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8572497Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8572599Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8574121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8574274Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8575779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8575929Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8577839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8578005Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8579706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8579863Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8580675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8580783Z   local_shape = tensor.shape
2025-12-04T09:25:19.8581601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8581712Z   local_shape = tensor.shape
2025-12-04T09:25:19.8582509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8582621Z   local_shape = tensor.shape
2025-12-04T09:25:19.8583426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8583569Z   local_shape = tensor.shape
2025-12-04T09:25:19.8584366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8584465Z   tensor.shape,
2025-12-04T09:25:19.8585309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8585403Z   tensor.shape,
2025-12-04T09:25:19.8586210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8586307Z   tensor.shape,
2025-12-04T09:25:19.8587107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8587212Z   tensor.dtype,
2025-12-04T09:25:19.8588013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8588110Z   tensor.shape,
2025-12-04T09:25:19.8589021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8589108Z   tensor.dtype,
2025-12-04T09:25:19.8589883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8589970Z   tensor.dtype,
2025-12-04T09:25:19.8590747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8590844Z   tensor.dtype,
2025-12-04T09:25:19.8591260Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8591809Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8592754Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8593221Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8594160Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8594520Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8595433Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8595876Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8596786Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8597228Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8598346Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8598749Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8599582Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8600029Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8601666Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 644808704 and is now 724500480.
2025-12-04T09:25:19.8601975Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8602536Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8603773Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8604081Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8604689Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8605154Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8605579Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8606210Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8607127Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8607575Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8608485Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8608829Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8609711Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8610142Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8611022Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8611447Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8612346Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8612747Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8613653Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8614090Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8615831Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 166658048 and is now 619642880.
2025-12-04T09:25:19.8616142Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8616951Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8618345Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8618687Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8619380Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8619978Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8620410Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8621099Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8622089Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8622570Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8623545Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8623916Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8624864Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8625320Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8626253Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8626777Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8627711Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8628178Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8629107Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8629578Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8631430Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8631775Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8632405Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8633890Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8634193Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8634867Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8635340Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8635715Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8636160Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8637033Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8637465Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8638529Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8638874Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8639757Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8640186Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8641062Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8641530Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8642430Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8642826Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8643703Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8644143Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8645882Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576.
2025-12-04T09:25:19.8646197Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8646799Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8648292Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8648624Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8649347Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8649963Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8650057Z FAILED [9.5086s] [100%]
2025-12-04T09:25:19.8650062Z 
2025-12-04T09:25:19.8650202Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8650696Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.8650810Z Traceback (most recent call last):
2025-12-04T09:25:19.8651333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8651435Z     self._join_processes(fn)
2025-12-04T09:25:19.8651990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8652131Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8652703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8652807Z     raise RuntimeError(error)
2025-12-04T09:25:19.8653035Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8653145Z Traceback (most recent call last):
2025-12-04T09:25:19.8653655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8653785Z     getattr(self, test_name)()
2025-12-04T09:25:19.8654284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8654371Z     fn()
2025-12-04T09:25:19.8654853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8654977Z     method(*args, **kwargs)
2025-12-04T09:25:19.8655458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8655552Z     method(*args, **kwargs)
2025-12-04T09:25:19.8656030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8656119Z     with policy():
2025-12-04T09:25:19.8656834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8656953Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8658379Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8658388Z 
2025-12-04T09:25:19.8658607Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8659574Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8659580Z 
2025-12-04T09:25:19.8659845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8659859Z 
2025-12-04T09:25:19.8659863Z 
2025-12-04T09:25:19.8660083Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8660345Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8661351Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml -
2025-12-04T09:25:19.8661525Z =========================== short test summary info ============================
2025-12-04T09:25:19.8662659Z FAILED [9.5086s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8662779Z Traceback (most recent call last):
2025-12-04T09:25:19.8663330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8663446Z     getattr(self, test_name)()
2025-12-04T09:25:19.8663983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8664071Z     fn()
2025-12-04T09:25:19.8664586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8664692Z     method(*args, **kwargs)
2025-12-04T09:25:19.8665198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8665300Z     method(*args, **kwargs)
2025-12-04T09:25:19.8665799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8665899Z     with policy():
2025-12-04T09:25:19.8666411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8666553Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8667989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8668037Z 
2025-12-04T09:25:19.8668256Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8669398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8669403Z 
2025-12-04T09:25:19.8669642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8669808Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8669968Z ======================= 1 failed, 7 deselected in 9.53s ========================
2025-12-04T09:25:19.8670053Z Got exit code 1
2025-12-04T09:25:19.8670150Z Retrying single test...
2025-12-04T09:25:19.8670827Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml
2025-12-04T09:25:19.8670973Z ============================= test session starts ==============================
2025-12-04T09:25:19.8671281Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8671379Z cachedir: .pytest_cache
2025-12-04T09:25:19.8671842Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8671952Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8672044Z configfile: pytest.ini
2025-12-04T09:25:19.8672527Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8672712Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8673692Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8673797Z Running 1 items in this shard
2025-12-04T09:25:19.8673801Z 
2025-12-04T09:25:19.8674971Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:57.624000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33064
2025-12-04T09:25:19.8675423Z I1204 09:22:57.625000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33065
2025-12-04T09:25:19.8675863Z I1204 09:22:57.626000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33066
2025-12-04T09:25:19.8676306Z I1204 09:22:57.626000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33067
2025-12-04T09:25:19.8678448Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8678559Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8680699Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8680858Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8682976Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8683088Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8685212Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8685319Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8686853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8686966Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8688565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8688679Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8690203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8690312Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8691839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8691948Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8694083Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8694209Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8696612Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8696921Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8699313Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8699431Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8701803Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8701918Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8703620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8703848Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8705547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8705714Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8707413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8707577Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8709360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8709499Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8710223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8710348Z   local_shape = tensor.shape
2025-12-04T09:25:19.8711065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8711165Z   local_shape = tensor.shape
2025-12-04T09:25:19.8711901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8712001Z   local_shape = tensor.shape
2025-12-04T09:25:19.8712712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8712804Z   tensor.shape,
2025-12-04T09:25:19.8713515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8713599Z   tensor.shape,
2025-12-04T09:25:19.8714316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8714401Z   tensor.shape,
2025-12-04T09:25:19.8715122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8715202Z   tensor.dtype,
2025-12-04T09:25:19.8715908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8715996Z   tensor.dtype,
2025-12-04T09:25:19.8716703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8716798Z   tensor.dtype,
2025-12-04T09:25:19.8717515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8717659Z   local_shape = tensor.shape
2025-12-04T09:25:19.8718391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8718474Z   tensor.shape,
2025-12-04T09:25:19.8719183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8719274Z   tensor.dtype,
2025-12-04T09:25:19.8719657Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8720116Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8721117Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8721770Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8722745Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8723117Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8724062Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8724597Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8725546Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8726047Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8726981Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8727411Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8728355Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8728829Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8730678Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880.
2025-12-04T09:25:19.8731026Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8731659Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8733131Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8733475Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8734197Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8734661Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8735044Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8735499Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8736424Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8737057Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8738021Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8738391Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8739364Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8739824Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8740806Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8741265Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8742195Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8742623Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8743564Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8744035Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8745881Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480.
2025-12-04T09:25:19.8746231Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8746914Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8748313Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8748656Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8749389Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8749857Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8750234Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8750689Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8751559Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8751985Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8752845Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8753205Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8754046Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8754532Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8755355Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8755772Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8756597Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8756982Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8757813Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8758230Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8759872Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8760183Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8760788Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8762024Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8762326Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8762940Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8763402Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8763781Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8764229Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8765095Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8765520Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8766382Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8766734Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8767567Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8767999Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8768825Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8769238Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8770067Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8770443Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8771270Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8771690Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8773319Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8773679Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8774255Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8775491Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8775792Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8776470Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8777156Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8777327Z FAILED [9.9274s] [100%]
2025-12-04T09:25:19.8777333Z 
2025-12-04T09:25:19.8777478Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8778012Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T09:25:19.8778133Z Traceback (most recent call last):
2025-12-04T09:25:19.8778690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8778801Z     self._join_processes(fn)
2025-12-04T09:25:19.8779424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8779572Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8780183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8780323Z     raise RuntimeError(error)
2025-12-04T09:25:19.8780566Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8780682Z Traceback (most recent call last):
2025-12-04T09:25:19.8781228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8781338Z     getattr(self, test_name)()
2025-12-04T09:25:19.8781872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8781967Z     fn()
2025-12-04T09:25:19.8782479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8782579Z     method(*args, **kwargs)
2025-12-04T09:25:19.8783091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8783199Z     method(*args, **kwargs)
2025-12-04T09:25:19.8783710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8783808Z     with policy():
2025-12-04T09:25:19.8784315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8784430Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8785855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480.
2025-12-04T09:25:19.8785863Z 
2025-12-04T09:25:19.8786088Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8787118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8787126Z 
2025-12-04T09:25:19.8787400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8787405Z 
2025-12-04T09:25:19.8787409Z 
2025-12-04T09:25:19.8787625Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8787890Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8788936Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml -
2025-12-04T09:25:19.8789090Z =========================== short test summary info ============================
2025-12-04T09:25:19.8790097Z FAILED [9.9274s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.8790205Z Traceback (most recent call last):
2025-12-04T09:25:19.8790700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8790807Z     getattr(self, test_name)()
2025-12-04T09:25:19.8791283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8791402Z     fn()
2025-12-04T09:25:19.8791854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8791945Z     method(*args, **kwargs)
2025-12-04T09:25:19.8792406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8792524Z     method(*args, **kwargs)
2025-12-04T09:25:19.8792968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8793063Z     with policy():
2025-12-04T09:25:19.8793517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8793619Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8794880Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480.
2025-12-04T09:25:19.8794887Z 
2025-12-04T09:25:19.8795076Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8795947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8795954Z 
2025-12-04T09:25:19.8796188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8796353Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8796509Z ======================= 1 failed, 7 deselected in 9.95s ========================
2025-12-04T09:25:19.8796593Z Got exit code 1
2025-12-04T09:25:19.8797382Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T09:25:19.8797746Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.8798797Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml
2025-12-04T09:25:19.8798945Z ============================= test session starts ==============================
2025-12-04T09:25:19.8799254Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8799352Z cachedir: .pytest_cache
2025-12-04T09:25:19.8799809Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8799922Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8800016Z configfile: pytest.ini
2025-12-04T09:25:19.8800492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8800681Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T09:25:19.8800805Z stepcurrent: skipping 5 already run items.
2025-12-04T09:25:19.8800903Z Running 3 items in this shard
2025-12-04T09:25:19.8800908Z 
2025-12-04T09:25:19.8802093Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:12.214000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33517
2025-12-04T09:25:19.8802532Z I1204 09:23:12.215000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33518
2025-12-04T09:25:19.8802975Z I1204 09:23:12.216000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33519
2025-12-04T09:25:19.8803436Z I1204 09:23:12.216000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33520
2025-12-04T09:25:19.8805587Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8805713Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8807845Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8807948Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8810077Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8810174Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8812319Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8812425Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8813951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8814071Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8815592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8815714Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8817565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8817699Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8819413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8819606Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8822183Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8822295Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8824692Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8824804Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8827179Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8827291Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8829791Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8829904Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8831616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8831779Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8833539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8833689Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8835295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8835478Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8837076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8837272Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8838031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8838144Z   local_shape = tensor.shape
2025-12-04T09:25:19.8838901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8839008Z   local_shape = tensor.shape
2025-12-04T09:25:19.8839778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8839882Z   local_shape = tensor.shape
2025-12-04T09:25:19.8840649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8840739Z   tensor.shape,
2025-12-04T09:25:19.8841495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8841596Z   tensor.shape,
2025-12-04T09:25:19.8842352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8842453Z   tensor.dtype,
2025-12-04T09:25:19.8843262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8843352Z   tensor.dtype,
2025-12-04T09:25:19.8844111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8844199Z   tensor.shape,
2025-12-04T09:25:19.8844958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8845044Z   tensor.dtype,
2025-12-04T09:25:19.8845796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8845905Z   local_shape = tensor.shape
2025-12-04T09:25:19.8846662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8846754Z   tensor.shape,
2025-12-04T09:25:19.8847510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8847599Z   tensor.dtype,
2025-12-04T09:25:19.8848012Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8848487Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8849439Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8849908Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8850840Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8851200Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8852083Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8852529Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8853407Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8853942Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8854772Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8855141Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8855980Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8856465Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8858527Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032.
2025-12-04T09:25:19.8858868Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8859500Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8860918Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8861258Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8861956Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8862475Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.8862908Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8863437Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8864415Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8864930Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8865895Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8866271Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8867199Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8867665Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8868703Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8869248Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8870125Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8870515Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8871404Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8871894Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8873638Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088.
2025-12-04T09:25:19.8873955Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8874549Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8875866Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8876186Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8876836Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8877329Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.8877764Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8878237Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8879160Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8879658Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8880740Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8881106Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8882019Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8882464Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8883378Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8883821Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8884740Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8885147Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8886223Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8886659Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8888382Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8888880Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8889495Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8890851Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8891175Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8891845Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8892347Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.8892782Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8893278Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.8894244Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8894715Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.8895648Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8896010Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.8897181Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8897649Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8898603Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8899060Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.8900001Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8900486Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.8901432Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8901893Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.8903730Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8904075Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8904714Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8906121Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8906462Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.8907160Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8907715Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.8907817Z FAILED [10.1799s] [ 33%]
2025-12-04T09:25:19.8907827Z 
2025-12-04T09:25:19.8908013Z =================================== FAILURES ===================================
2025-12-04T09:25:19.8908776Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.8908901Z Traceback (most recent call last):
2025-12-04T09:25:19.8909418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.8909524Z     self._join_processes(fn)
2025-12-04T09:25:19.8910080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.8910215Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.8910793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.8910901Z     raise RuntimeError(error)
2025-12-04T09:25:19.8911127Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8911248Z Traceback (most recent call last):
2025-12-04T09:25:19.8911758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8911860Z     getattr(self, test_name)()
2025-12-04T09:25:19.8912376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8912460Z     fn()
2025-12-04T09:25:19.8912942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8913043Z     method(*args, **kwargs)
2025-12-04T09:25:19.8913519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8913627Z     method(*args, **kwargs)
2025-12-04T09:25:19.8914155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8914248Z     with policy():
2025-12-04T09:25:19.8914735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8914840Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8916171Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8916179Z 
2025-12-04T09:25:19.8916383Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8917298Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8917313Z 
2025-12-04T09:25:19.8917564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8917569Z 
2025-12-04T09:25:19.8917724Z Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.8917843Z Traceback (most recent call last):
2025-12-04T09:25:19.8918361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8918462Z     getattr(self, test_name)()
2025-12-04T09:25:19.8918970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8919083Z     fn()
2025-12-04T09:25:19.8919565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8919661Z     method(*args, **kwargs)
2025-12-04T09:25:19.8920140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8920269Z     method(*args, **kwargs)
2025-12-04T09:25:19.8920863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8920962Z     with policy():
2025-12-04T09:25:19.8921638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8921743Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8923168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032.
2025-12-04T09:25:19.8923177Z 
2025-12-04T09:25:19.8923396Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8924361Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8924367Z 
2025-12-04T09:25:19.8924635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8924640Z 
2025-12-04T09:25:19.8924644Z 
2025-12-04T09:25:19.8924860Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.8925128Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.8926068Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml -
2025-12-04T09:25:19.8926340Z =========================== short test summary info ============================
2025-12-04T09:25:19.8927466Z FAILED [10.1799s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.8927593Z Traceback (most recent call last):
2025-12-04T09:25:19.8928145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8928257Z     getattr(self, test_name)()
2025-12-04T09:25:19.8928802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8928894Z     fn()
2025-12-04T09:25:19.8929401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8929511Z     method(*args, **kwargs)
2025-12-04T09:25:19.8930020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8930134Z     method(*args, **kwargs)
2025-12-04T09:25:19.8930639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8930736Z     with policy():
2025-12-04T09:25:19.8931261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8931365Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8932775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.8932842Z 
2025-12-04T09:25:19.8933061Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8934118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8934123Z 
2025-12-04T09:25:19.8934381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8934386Z 
2025-12-04T09:25:19.8934538Z Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.8934656Z Traceback (most recent call last):
2025-12-04T09:25:19.8935169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.8935271Z     getattr(self, test_name)()
2025-12-04T09:25:19.8935783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.8935866Z     fn()
2025-12-04T09:25:19.8936410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8936519Z     method(*args, **kwargs)
2025-12-04T09:25:19.8937185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.8937296Z     method(*args, **kwargs)
2025-12-04T09:25:19.8937802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.8937896Z     with policy():
2025-12-04T09:25:19.8938418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.8938530Z     raise RuntimeError(msg)
2025-12-04T09:25:19.8940015Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032.
2025-12-04T09:25:19.8940024Z 
2025-12-04T09:25:19.8940241Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.8941199Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8941214Z 
2025-12-04T09:25:19.8941481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.8941661Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.8941850Z ======================= 1 failed, 5 deselected in 10.20s =======================
2025-12-04T09:25:19.8941947Z Got exit code 1
2025-12-04T09:25:19.8942050Z Retrying single test...
2025-12-04T09:25:19.8942819Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml
2025-12-04T09:25:19.8942984Z ============================= test session starts ==============================
2025-12-04T09:25:19.8943340Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.8943445Z cachedir: .pytest_cache
2025-12-04T09:25:19.8943959Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.8944087Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.8944192Z configfile: pytest.ini
2025-12-04T09:25:19.8944755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.8944970Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.8946022Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.8946169Z Running 1 items in this shard
2025-12-04T09:25:19.8946174Z 
2025-12-04T09:25:19.8947492Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:26.944000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33970
2025-12-04T09:25:19.8947996Z I1204 09:23:26.945000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33971
2025-12-04T09:25:19.8948492Z I1204 09:23:26.945000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33972
2025-12-04T09:25:19.8949071Z I1204 09:23:26.946000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33973
2025-12-04T09:25:19.8951229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8951330Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8953514Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8953616Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8955741Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8955839Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8957970Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8958071Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8959609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8959751Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8961279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8961443Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8962965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8963083Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8964626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8964752Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.8966884Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8966993Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8969151Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8969258Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8971368Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8971472Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8973583Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.8973683Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.8975207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8975434Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8977269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8977438Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8979145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8979310Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8981022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.8981177Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.8981986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8982106Z   local_shape = tensor.shape
2025-12-04T09:25:19.8982914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8983097Z   tensor.shape,
2025-12-04T09:25:19.8983903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8983999Z   tensor.dtype,
2025-12-04T09:25:19.8984821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8984929Z   local_shape = tensor.shape
2025-12-04T09:25:19.8985737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8985835Z   tensor.shape,
2025-12-04T09:25:19.8986643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8986747Z   tensor.dtype,
2025-12-04T09:25:19.8987546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8987665Z   local_shape = tensor.shape
2025-12-04T09:25:19.8988468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8988562Z   tensor.shape,
2025-12-04T09:25:19.8989413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8989525Z   tensor.dtype,
2025-12-04T09:25:19.8990243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8990365Z   local_shape = tensor.shape
2025-12-04T09:25:19.8991078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8991169Z   tensor.shape,
2025-12-04T09:25:19.8991884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.8991966Z   tensor.dtype,
2025-12-04T09:25:19.8992353Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.8992806Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9017723Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9018301Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9019283Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9019650Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9020582Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9021407Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9022367Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9022822Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9023766Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9024183Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9025127Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9025585Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9027427Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 649003008 and is now 724500480.
2025-12-04T09:25:19.9027770Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9028449Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9029850Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9030226Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9030924Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9031440Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9031866Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9032379Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9033502Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9033933Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9034780Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9035118Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9036210Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9036642Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9037521Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9037947Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9038823Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9039216Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9040094Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9040535Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9042256Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 518979584 and is now 615448576.
2025-12-04T09:25:19.9042605Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9043195Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9044533Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9044847Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9045503Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9045991Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9046389Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9046870Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9047781Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9048235Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9049136Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9049481Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9050415Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9050847Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9051726Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9052153Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9053028Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9053419Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9054302Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9054739Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9056546Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 395247616 and is now 615448576.
2025-12-04T09:25:19.9057095Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9057754Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9059147Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9059480Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9060170Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9060690Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9061116Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9061628Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9062601Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9063081Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9064039Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9064458Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9065399Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9065854Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9066787Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9067241Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9068185Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9068708Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9069670Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9070082Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9071711Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576.
2025-12-04T09:25:19.9072066Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9072622Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9073854Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9074153Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9074756Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9075223Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9075316Z FAILED [10.0662s] [100%]
2025-12-04T09:25:19.9075323Z 
2025-12-04T09:25:19.9075462Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9075914Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.9076017Z Traceback (most recent call last):
2025-12-04T09:25:19.9076507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9076609Z     self._join_processes(fn)
2025-12-04T09:25:19.9077132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9077256Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9077850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9077957Z     raise RuntimeError(error)
2025-12-04T09:25:19.9078161Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9078266Z Traceback (most recent call last):
2025-12-04T09:25:19.9078749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9078845Z     getattr(self, test_name)()
2025-12-04T09:25:19.9079323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9079403Z     fn()
2025-12-04T09:25:19.9079849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9079947Z     method(*args, **kwargs)
2025-12-04T09:25:19.9080397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9080487Z     method(*args, **kwargs)
2025-12-04T09:25:19.9080936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9081023Z     with policy():
2025-12-04T09:25:19.9081479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9081576Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9082827Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576.
2025-12-04T09:25:19.9082866Z 
2025-12-04T09:25:19.9083063Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9083948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9083954Z 
2025-12-04T09:25:19.9084193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9084198Z 
2025-12-04T09:25:19.9084202Z 
2025-12-04T09:25:19.9084399Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9084637Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9085474Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml -
2025-12-04T09:25:19.9085625Z =========================== short test summary info ============================
2025-12-04T09:25:19.9086629Z FAILED [10.0662s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9086737Z Traceback (most recent call last):
2025-12-04T09:25:19.9087231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9087332Z     getattr(self, test_name)()
2025-12-04T09:25:19.9087807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9087894Z     fn()
2025-12-04T09:25:19.9088343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9088439Z     method(*args, **kwargs)
2025-12-04T09:25:19.9088937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9089032Z     method(*args, **kwargs)
2025-12-04T09:25:19.9089484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9089568Z     with policy():
2025-12-04T09:25:19.9090017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9090118Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9091362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576.
2025-12-04T09:25:19.9091369Z 
2025-12-04T09:25:19.9091567Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9092424Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9092432Z 
2025-12-04T09:25:19.9092673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9092830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9092987Z ======================= 1 failed, 7 deselected in 10.09s =======================
2025-12-04T09:25:19.9093078Z Got exit code 1
2025-12-04T09:25:19.9093171Z Retrying single test...
2025-12-04T09:25:19.9093863Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml
2025-12-04T09:25:19.9094008Z ============================= test session starts ==============================
2025-12-04T09:25:19.9094318Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9094447Z cachedir: .pytest_cache
2025-12-04T09:25:19.9094903Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9095011Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9095109Z configfile: pytest.ini
2025-12-04T09:25:19.9095586Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9095764Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9096953Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9097066Z Running 1 items in this shard
2025-12-04T09:25:19.9097077Z 
2025-12-04T09:25:19.9098410Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:41.683000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34423
2025-12-04T09:25:19.9098909Z I1204 09:23:41.684000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34424
2025-12-04T09:25:19.9099405Z I1204 09:23:41.685000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34425
2025-12-04T09:25:19.9099893Z I1204 09:23:41.686000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34426
2025-12-04T09:25:19.9102387Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9102500Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9104886Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9105003Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9107406Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9107521Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9109877Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9110032Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9111574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9111692Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9113231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9113355Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9114869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9114978Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9116501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9116731Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9118871Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9118968Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9121400Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9121518Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9123893Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9124064Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9126450Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9126609Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9128322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9128488Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.9130199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9130362Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.9132063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9132230Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.9134014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9134164Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T09:25:19.9134879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9134983Z   local_shape = tensor.shape
2025-12-04T09:25:19.9135690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9135789Z   local_shape = tensor.shape
2025-12-04T09:25:19.9136731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9136834Z   tensor.shape,
2025-12-04T09:25:19.9137648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9137742Z   tensor.shape,
2025-12-04T09:25:19.9138553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9138654Z   tensor.dtype,
2025-12-04T09:25:19.9139452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9139578Z   tensor.dtype,
2025-12-04T09:25:19.9140377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9140492Z   local_shape = tensor.shape
2025-12-04T09:25:19.9141319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9141419Z   tensor.shape,
2025-12-04T09:25:19.9142216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9142331Z   local_shape = tensor.shape
2025-12-04T09:25:19.9143126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9143220Z   tensor.dtype,
2025-12-04T09:25:19.9144034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9144128Z   tensor.shape,
2025-12-04T09:25:19.9144933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T09:25:19.9145028Z   tensor.dtype,
2025-12-04T09:25:19.9145453Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9145964Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9146934Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9147472Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9148555Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9149011Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9149842Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9150247Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9151087Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9151490Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9152322Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9152690Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9153514Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9153953Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9155590Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 649003008 and is now 724500480.
2025-12-04T09:25:19.9155918Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9156471Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9157710Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9158011Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9158624Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9159088Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9159462Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9159914Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9160776Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9161250Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9162105Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9162429Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9163258Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9163664Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9164497Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9164903Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9165732Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9166274Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9167177Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9167617Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9169365Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.9169684Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9170272Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9171584Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9171900Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9172543Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9173033Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9173424Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9173908Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9175059Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9175533Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9176544Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9177082Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9178032Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9178494Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9179436Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9179888Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9180814Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9181283Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9182219Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9183053Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9184890Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.9185236Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9185864Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9187274Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9187610Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9188405Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9188914Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9189436Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9190066Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9190929Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9191350Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9192209Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9192535Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9193369Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9193773Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9194601Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9195003Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9195990Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9196416Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9197295Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9197771Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9199498Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 456065024 and is now 615448576.
2025-12-04T09:25:19.9199817Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9200408Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9201707Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9202026Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9202664Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9203158Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9203253Z FAILED [10.0045s] [100%]
2025-12-04T09:25:19.9203259Z 
2025-12-04T09:25:19.9203455Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9204037Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T09:25:19.9204143Z Traceback (most recent call last):
2025-12-04T09:25:19.9204631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9204730Z     self._join_processes(fn)
2025-12-04T09:25:19.9205245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9205375Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9205910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9206013Z     raise RuntimeError(error)
2025-12-04T09:25:19.9206221Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9206326Z Traceback (most recent call last):
2025-12-04T09:25:19.9206812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9206906Z     getattr(self, test_name)()
2025-12-04T09:25:19.9207376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9207461Z     fn()
2025-12-04T09:25:19.9207908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9208007Z     method(*args, **kwargs)
2025-12-04T09:25:19.9208481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9208571Z     method(*args, **kwargs)
2025-12-04T09:25:19.9209025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9209133Z     with policy():
2025-12-04T09:25:19.9209581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9209683Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9210936Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.9210942Z 
2025-12-04T09:25:19.9211138Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9211990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9211999Z 
2025-12-04T09:25:19.9212241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9212246Z 
2025-12-04T09:25:19.9212250Z 
2025-12-04T09:25:19.9212441Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9212671Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9213504Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml -
2025-12-04T09:25:19.9213651Z =========================== short test summary info ============================
2025-12-04T09:25:19.9214653Z FAILED [10.0045s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9214809Z Traceback (most recent call last):
2025-12-04T09:25:19.9215300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9215396Z     getattr(self, test_name)()
2025-12-04T09:25:19.9215866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9215948Z     fn()
2025-12-04T09:25:19.9216462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9216720Z     method(*args, **kwargs)
2025-12-04T09:25:19.9217234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9217333Z     method(*args, **kwargs)
2025-12-04T09:25:19.9217847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9217945Z     with policy():
2025-12-04T09:25:19.9218450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9218566Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9219977Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576.
2025-12-04T09:25:19.9219983Z 
2025-12-04T09:25:19.9220242Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9221426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9221433Z 
2025-12-04T09:25:19.9221704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9221958Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9222133Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T09:25:19.9222237Z Got exit code 1
2025-12-04T09:25:19.9223127Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T09:25:19.9223532Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.9224296Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml
2025-12-04T09:25:19.9224461Z ============================= test session starts ==============================
2025-12-04T09:25:19.9224821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9224924Z cachedir: .pytest_cache
2025-12-04T09:25:19.9225436Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9225559Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9225661Z configfile: pytest.ini
2025-12-04T09:25:19.9226197Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9226404Z collecting ... collected 8 items / 6 deselected / 2 selected
2025-12-04T09:25:19.9226542Z stepcurrent: skipping 6 already run items.
2025-12-04T09:25:19.9226659Z Running 2 items in this shard
2025-12-04T09:25:19.9226664Z 
2025-12-04T09:25:19.9227899Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:23:56.344000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34876
2025-12-04T09:25:19.9228409Z I1204 09:23:56.345000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34877
2025-12-04T09:25:19.9228913Z I1204 09:23:56.345000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34878
2025-12-04T09:25:19.9229399Z I1204 09:23:56.346000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34879
2025-12-04T09:25:19.9231820Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9231941Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9234350Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9234488Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9236600Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9236725Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9238843Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9238943Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9240486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9240605Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9242115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9242234Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9243795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9243907Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9245441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9245551Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9245938Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9246388Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9247255Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9247678Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9248526Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9248882Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9249706Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9250145Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9250964Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9251366Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9252197Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9252565Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9253394Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9253803Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9255279Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 734986240.
2025-12-04T09:25:19.9255576Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9256243Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9257593Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9257923Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9258609Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9259126Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9259554Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9260057Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9261029Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9261509Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9262472Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9262877Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9263803Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9264292Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9265215Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9265669Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9266604Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9267018Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9267954Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9268410Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9270035Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424.
2025-12-04T09:25:19.9270331Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9270938Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9272011Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9272307Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9272915Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9273556Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9273952Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9274423Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9275331Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9275783Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9276680Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9277059Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9277932Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9278387Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9279263Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9279689Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9280571Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9280963Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9281849Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9282279Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9283836Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.9284342Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9284900Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9285970Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9286261Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9286875Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9287324Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9287697Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9288142Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9288995Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9289420Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9290299Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9290628Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9291479Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9291879Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9292708Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9293113Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9293940Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9294309Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9295131Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9295546Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9297365Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 477036544 and is now 613351424.
2025-12-04T09:25:19.9297714Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9298343Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9299547Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9299882Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9300573Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9301099Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9301199Z FAILED [9.2135s] [ 50%]
2025-12-04T09:25:19.9301206Z 
2025-12-04T09:25:19.9301359Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9301700Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T09:25:19.9301817Z Traceback (most recent call last):
2025-12-04T09:25:19.9302368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9302478Z     self._join_processes(fn)
2025-12-04T09:25:19.9303064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9303233Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9303849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9303995Z     raise RuntimeError(error)
2025-12-04T09:25:19.9304225Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9304339Z Traceback (most recent call last):
2025-12-04T09:25:19.9304882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9304992Z     getattr(self, test_name)()
2025-12-04T09:25:19.9305531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9305618Z     fn()
2025-12-04T09:25:19.9306126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9306234Z     method(*args, **kwargs)
2025-12-04T09:25:19.9306737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9306844Z     method(*args, **kwargs)
2025-12-04T09:25:19.9307345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9307439Z     with policy():
2025-12-04T09:25:19.9307959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9308063Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9309351Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424.
2025-12-04T09:25:19.9309366Z 
2025-12-04T09:25:19.9309553Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9310293Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9310301Z 
2025-12-04T09:25:19.9310540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9310546Z 
2025-12-04T09:25:19.9310685Z Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9310796Z Traceback (most recent call last):
2025-12-04T09:25:19.9311278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9311375Z     getattr(self, test_name)()
2025-12-04T09:25:19.9311857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9311938Z     fn()
2025-12-04T09:25:19.9312385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9312485Z     method(*args, **kwargs)
2025-12-04T09:25:19.9312926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9313021Z     method(*args, **kwargs)
2025-12-04T09:25:19.9313468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9313552Z     with policy():
2025-12-04T09:25:19.9314003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9314095Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9315188Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.9315225Z 
2025-12-04T09:25:19.9315419Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9316126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9316131Z 
2025-12-04T09:25:19.9316366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9316371Z 
2025-12-04T09:25:19.9316375Z 
2025-12-04T09:25:19.9316564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9316799Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9317621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml -
2025-12-04T09:25:19.9317767Z =========================== short test summary info ============================
2025-12-04T09:25:19.9318611Z FAILED [9.2135s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9318715Z Traceback (most recent call last):
2025-12-04T09:25:19.9319205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9319300Z     getattr(self, test_name)()
2025-12-04T09:25:19.9319776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9319857Z     fn()
2025-12-04T09:25:19.9320306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9320405Z     method(*args, **kwargs)
2025-12-04T09:25:19.9321042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9321300Z     method(*args, **kwargs)
2025-12-04T09:25:19.9321812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9321905Z     with policy():
2025-12-04T09:25:19.9322409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9322523Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9323757Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424.
2025-12-04T09:25:19.9323766Z 
2025-12-04T09:25:19.9323984Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9324769Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9324776Z 
2025-12-04T09:25:19.9325045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9325050Z 
2025-12-04T09:25:19.9325208Z Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9325325Z Traceback (most recent call last):
2025-12-04T09:25:19.9325877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9325982Z     getattr(self, test_name)()
2025-12-04T09:25:19.9326512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9326659Z     fn()
2025-12-04T09:25:19.9327170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9327286Z     method(*args, **kwargs)
2025-12-04T09:25:19.9327912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9328016Z     method(*args, **kwargs)
2025-12-04T09:25:19.9328531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9328629Z     with policy():
2025-12-04T09:25:19.9329139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9329259Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9330508Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424.
2025-12-04T09:25:19.9330516Z 
2025-12-04T09:25:19.9330743Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9331519Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9331525Z 
2025-12-04T09:25:19.9331797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9331974Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9332152Z ======================= 1 failed, 6 deselected in 9.23s ========================
2025-12-04T09:25:19.9332261Z Got exit code 1
2025-12-04T09:25:19.9332370Z Retrying single test...
2025-12-04T09:25:19.9333119Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml
2025-12-04T09:25:19.9333359Z ============================= test session starts ==============================
2025-12-04T09:25:19.9333817Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9333920Z cachedir: .pytest_cache
2025-12-04T09:25:19.9334374Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9334479Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9334584Z configfile: pytest.ini
2025-12-04T09:25:19.9335059Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9335254Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9336026Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9336126Z Running 1 items in this shard
2025-12-04T09:25:19.9336135Z 
2025-12-04T09:25:19.9337467Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:24:10.014000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35269
2025-12-04T09:25:19.9337968Z I1204 09:24:10.015000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35270
2025-12-04T09:25:19.9338469Z I1204 09:24:10.015000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35271
2025-12-04T09:25:19.9338962Z I1204 09:24:10.016000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35272
2025-12-04T09:25:19.9341412Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9341557Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9343950Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9344071Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9346469Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9346590Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9349170Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9349334Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9350888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9351015Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9352535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9352661Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9354187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9354302Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9355827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9355965Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9356355Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9356827Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9357690Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9358116Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9358973Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9359311Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9360137Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9360552Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9361378Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9361799Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9362670Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9363047Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9363887Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9364295Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9365785Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880.
2025-12-04T09:25:19.9366092Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9366662Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9367723Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9368020Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9368661Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9369122Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9369540Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9369990Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9370852Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9371286Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9372144Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9372479Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9373305Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9373717Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9374541Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9374949Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9375835Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9376266Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9377347Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9377810Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9379485Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 730791936.
2025-12-04T09:25:19.9379828Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9380458Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9381665Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9382003Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9382751Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9383271Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9383733Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9384233Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9385203Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9385684Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9386649Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9387027Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9387964Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9388433Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9389540Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9390067Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9390948Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9391322Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9392167Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9392576Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9394064Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9394365Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9394928Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9396005Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9396327Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9396943Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9397405Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9397807Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9398264Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9399127Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9399560Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9400419Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9400756Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9401581Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9401986Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9402816Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9403223Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9404107Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9404483Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9405323Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9405731Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9407207Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9407519Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9408079Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9409156Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9409481Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9410106Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9410587Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9410679Z FAILED [8.9345s] [100%]
2025-12-04T09:25:19.9410684Z 
2025-12-04T09:25:19.9410825Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9411129Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T09:25:19.9411247Z Traceback (most recent call last):
2025-12-04T09:25:19.9411728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9411832Z     self._join_processes(fn)
2025-12-04T09:25:19.9412362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9412491Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9413033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9413143Z     raise RuntimeError(error)
2025-12-04T09:25:19.9413349Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9413466Z Traceback (most recent call last):
2025-12-04T09:25:19.9413944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9414047Z     getattr(self, test_name)()
2025-12-04T09:25:19.9414525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9414607Z     fn()
2025-12-04T09:25:19.9415058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9415210Z     method(*args, **kwargs)
2025-12-04T09:25:19.9415665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9415765Z     method(*args, **kwargs)
2025-12-04T09:25:19.9416276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9416368Z     with policy():
2025-12-04T09:25:19.9417032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9417144Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9418395Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880.
2025-12-04T09:25:19.9418404Z 
2025-12-04T09:25:19.9418623Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9419409Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9419416Z 
2025-12-04T09:25:19.9419689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9419694Z 
2025-12-04T09:25:19.9419699Z 
2025-12-04T09:25:19.9419915Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9420184Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9421277Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml -
2025-12-04T09:25:19.9421525Z =========================== short test summary info ============================
2025-12-04T09:25:19.9422468Z FAILED [8.9345s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:25:19.9422631Z Traceback (most recent call last):
2025-12-04T09:25:19.9423193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9423304Z     getattr(self, test_name)()
2025-12-04T09:25:19.9423839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9423940Z     fn()
2025-12-04T09:25:19.9424452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9424566Z     method(*args, **kwargs)
2025-12-04T09:25:19.9425069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9425176Z     method(*args, **kwargs)
2025-12-04T09:25:19.9425683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9425780Z     with policy():
2025-12-04T09:25:19.9426286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9426407Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9427645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880.
2025-12-04T09:25:19.9427654Z 
2025-12-04T09:25:19.9427878Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9428741Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9428750Z 
2025-12-04T09:25:19.9429022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9429200Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9429376Z ======================= 1 failed, 7 deselected in 8.96s ========================
2025-12-04T09:25:19.9429483Z Got exit code 1
2025-12-04T09:25:19.9429592Z Retrying single test...
2025-12-04T09:25:19.9430352Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml
2025-12-04T09:25:19.9430522Z ============================= test session starts ==============================
2025-12-04T09:25:19.9430869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9430990Z cachedir: .pytest_cache
2025-12-04T09:25:19.9431509Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9431629Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9431740Z configfile: pytest.ini
2025-12-04T09:25:19.9432270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9432587Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9433482Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9433608Z Running 1 items in this shard
2025-12-04T09:25:19.9433613Z 
2025-12-04T09:25:19.9434637Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:24:23.644000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35662
2025-12-04T09:25:19.9435107Z I1204 09:24:23.645000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35663
2025-12-04T09:25:19.9435550Z I1204 09:24:23.646000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35664
2025-12-04T09:25:19.9435980Z I1204 09:24:23.646000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35665
2025-12-04T09:25:19.9438105Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9438219Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9440327Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9440436Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9442770Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9442881Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9445116Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9445225Z   FSDP.set_state_dict_type(
2025-12-04T09:25:19.9446852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9446976Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9448589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9448743Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9450357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9450497Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9452329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T09:25:19.9452445Z   device = _get_pg_default_device(group)
2025-12-04T09:25:19.9452874Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9453368Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9454323Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9454785Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9455719Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9456080Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9457324Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9457796Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9458725Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9459190Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9460116Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9460533Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9461468Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9461926Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9463589Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 619642880.
2025-12-04T09:25:19.9463954Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9464589Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9465819Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9466148Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9466843Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9467357Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9467904Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9468386Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9469330Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9469799Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9470730Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9471090Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9472128Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9472563Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9473431Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9473856Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9474743Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9475135Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9476016Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9476446Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9478025Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9478366Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9478956Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9480118Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9480523Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9481133Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9481587Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9481968Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9482412Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9483268Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9483694Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9484545Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9484875Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9485766Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9486176Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9487001Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9487402Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9488230Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9488600Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9489430Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9489834Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9491304Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328.
2025-12-04T09:25:19.9491630Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9492217Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9493290Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9493582Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9494193Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9494649Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9495022Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9495472Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9496387Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9497019Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9497977Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9498411Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9499345Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9499798Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9500734Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9501186Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9502123Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9502540Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9503479Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9503938Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9505587Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9505962Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9506617Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9507821Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9508154Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9508955Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9509525Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9509613Z FAILED [8.9096s] [100%]
2025-12-04T09:25:19.9509619Z 
2025-12-04T09:25:19.9509753Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9510049Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T09:25:19.9510161Z Traceback (most recent call last):
2025-12-04T09:25:19.9510644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9510740Z     self._join_processes(fn)
2025-12-04T09:25:19.9511265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9511390Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9511979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9512088Z     raise RuntimeError(error)
2025-12-04T09:25:19.9512292Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.9512403Z Traceback (most recent call last):
2025-12-04T09:25:19.9512877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9512975Z     getattr(self, test_name)()
2025-12-04T09:25:19.9513451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9513527Z     fn()
2025-12-04T09:25:19.9513978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9514073Z     method(*args, **kwargs)
2025-12-04T09:25:19.9514522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9514621Z     method(*args, **kwargs)
2025-12-04T09:25:19.9515063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9515151Z     with policy():
2025-12-04T09:25:19.9515603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9515696Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9516795Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9516827Z 
2025-12-04T09:25:19.9517015Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9517703Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9517733Z 
2025-12-04T09:25:19.9517969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9517973Z 
2025-12-04T09:25:19.9517977Z 
2025-12-04T09:25:19.9518168Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9518402Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9519229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml -
2025-12-04T09:25:19.9519378Z =========================== short test summary info ============================
2025-12-04T09:25:19.9520217Z FAILED [8.9096s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.9520324Z Traceback (most recent call last):
2025-12-04T09:25:19.9520943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9521043Z     getattr(self, test_name)()
2025-12-04T09:25:19.9521720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9521815Z     fn()
2025-12-04T09:25:19.9522317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9522428Z     method(*args, **kwargs)
2025-12-04T09:25:19.9522931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9523031Z     method(*args, **kwargs)
2025-12-04T09:25:19.9523629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9523727Z     with policy():
2025-12-04T09:25:19.9524234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9524343Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9525577Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9525584Z 
2025-12-04T09:25:19.9525802Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9526573Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9526578Z 
2025-12-04T09:25:19.9526849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9527026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9527203Z ======================= 1 failed, 7 deselected in 8.93s ========================
2025-12-04T09:25:19.9527302Z Got exit code 1
2025-12-04T09:25:19.9527999Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T09:25:19.9528404Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.9529506Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml
2025-12-04T09:25:19.9529663Z ============================= test session starts ==============================
2025-12-04T09:25:19.9530022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9530179Z cachedir: .pytest_cache
2025-12-04T09:25:19.9530690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9530817Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9530917Z configfile: pytest.ini
2025-12-04T09:25:19.9531456Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9531659Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9531799Z stepcurrent: skipping 7 already run items.
2025-12-04T09:25:19.9531917Z Running 1 items in this shard
2025-12-04T09:25:19.9531922Z 
2025-12-04T09:25:19.9533057Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:24:37.314000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36055
2025-12-04T09:25:19.9533673Z I1204 09:24:37.314000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36056
2025-12-04T09:25:19.9534105Z I1204 09:24:37.315000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36057
2025-12-04T09:25:19.9534536Z I1204 09:24:37.316000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36058
2025-12-04T09:25:19.9538392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9538801Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9542600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9543003Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9546803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9547319Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9551315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9551690Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9554055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9554375Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9556728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9556992Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9559340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9559598Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9562244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9562527Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9564875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9565061Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9567416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9567580Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9569915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9570077Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9572566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9572717Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9573095Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9573543Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9574412Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9574840Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9575708Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9576033Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9577133Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9577679Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9578612Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9579106Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9580040Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9580458Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9581392Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9581861Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9583492Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424.
2025-12-04T09:25:19.9583826Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9584458Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9585698Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9586038Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9586722Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9587240Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9587657Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9588152Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9589203Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9589626Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9590482Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9590805Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9591632Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9592076Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9592904Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9593336Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9594154Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9594524Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9595355Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9595764Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9597215Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176.
2025-12-04T09:25:19.9597508Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9598069Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9599171Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9599476Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9600080Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9600537Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9600918Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9601363Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9602239Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9602662Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9603517Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9603841Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9604657Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9605097Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9605944Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9606352Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9607174Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9607546Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9608383Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9608792Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9610245Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272.
2025-12-04T09:25:19.9610539Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9611103Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9612204Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9612505Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9613112Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9613566Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9613944Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9614391Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9615255Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9615675Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9616767Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9617147Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9618180Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9618647Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9619606Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9620065Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9621160Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9621582Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9622523Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9622981Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9624619Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272.
2025-12-04T09:25:19.9624951Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9625593Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9626868Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9627202Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9627891Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9628408Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9628517Z FAILED [9.2355s] [100%]
2025-12-04T09:25:19.9628524Z 
2025-12-04T09:25:19.9628668Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9629000Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T09:25:19.9629134Z Traceback (most recent call last):
2025-12-04T09:25:19.9629678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9629797Z     self._join_processes(fn)
2025-12-04T09:25:19.9630378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9630515Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9631124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9631234Z     raise RuntimeError(error)
2025-12-04T09:25:19.9631516Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.9631631Z Traceback (most recent call last):
2025-12-04T09:25:19.9632174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9632328Z     getattr(self, test_name)()
2025-12-04T09:25:19.9633062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9633141Z     fn()
2025-12-04T09:25:19.9633596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9633689Z     method(*args, **kwargs)
2025-12-04T09:25:19.9634146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9634234Z     method(*args, **kwargs)
2025-12-04T09:25:19.9634682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9634775Z     with policy():
2025-12-04T09:25:19.9635226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9635323Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9636398Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272.
2025-12-04T09:25:19.9636404Z 
2025-12-04T09:25:19.9636593Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9637274Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9637281Z 
2025-12-04T09:25:19.9637513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9637518Z 
2025-12-04T09:25:19.9637522Z 
2025-12-04T09:25:19.9637724Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9638019Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9638847Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml -
2025-12-04T09:25:19.9639001Z =========================== short test summary info ============================
2025-12-04T09:25:19.9639816Z FAILED [9.2355s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:25:19.9639931Z Traceback (most recent call last):
2025-12-04T09:25:19.9640413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9640510Z     getattr(self, test_name)()
2025-12-04T09:25:19.9640994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9641076Z     fn()
2025-12-04T09:25:19.9641528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9641618Z     method(*args, **kwargs)
2025-12-04T09:25:19.9642061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9642155Z     method(*args, **kwargs)
2025-12-04T09:25:19.9642599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9642710Z     with policy():
2025-12-04T09:25:19.9643161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9643255Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9644338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272.
2025-12-04T09:25:19.9644370Z 
2025-12-04T09:25:19.9644561Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9645235Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9645246Z 
2025-12-04T09:25:19.9645478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9645633Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9645792Z ======================= 1 failed, 7 deselected in 9.26s ========================
2025-12-04T09:25:19.9645875Z Got exit code 1
2025-12-04T09:25:19.9645965Z Retrying single test...
2025-12-04T09:25:19.9646642Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml
2025-12-04T09:25:19.9646783Z ============================= test session starts ==============================
2025-12-04T09:25:19.9647095Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9647185Z cachedir: .pytest_cache
2025-12-04T09:25:19.9647644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9647752Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9647846Z configfile: pytest.ini
2025-12-04T09:25:19.9648316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9648504Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9649300Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9649404Z Running 1 items in this shard
2025-12-04T09:25:19.9649408Z 
2025-12-04T09:25:19.9650402Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:24:51.274000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36448
2025-12-04T09:25:19.9650842Z I1204 09:24:51.275000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36449
2025-12-04T09:25:19.9651287Z I1204 09:24:51.276000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36450
2025-12-04T09:25:19.9651721Z I1204 09:24:51.277000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36451
2025-12-04T09:25:19.9655112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9655487Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9659405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9659842Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9663647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9664046Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9667879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9668271Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9670778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9671037Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9673472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9673769Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9675992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9676242Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9678459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9678701Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9680922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9681126Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9683362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9683513Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9685721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9685872Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9688095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9688265Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9688652Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9689125Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9689992Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9690414Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9691275Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9691603Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9692441Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9692848Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9693670Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9694078Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9694963Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9695340Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9696220Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9696809Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9698445Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176.
2025-12-04T09:25:19.9698785Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9699422Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9700606Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9700942Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9701633Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9702190Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9702618Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9703144Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9704118Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9704594Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9705565Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9705931Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9706860Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9707319Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9708245Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9708715Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9709714Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9710096Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9710923Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9711329Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9712780Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272.
2025-12-04T09:25:19.9713079Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9713641Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9714893Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9715210Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9715882Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9716367Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9716796Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9717260Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9718177Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9718622Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9719536Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9719883Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9720941Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9721561Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9722486Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9722946Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9723964Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9724383Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9725326Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9725783Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9727426Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272.
2025-12-04T09:25:19.9727760Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9728394Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9729584Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9729922Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9730649Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9731168Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9731629Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9732126Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9733106Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9733692Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9734631Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9734994Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9735891Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9736413Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9737505Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9737970Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9738961Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9739378Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9740316Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9740774Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9742425Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272.
2025-12-04T09:25:19.9742760Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9743392Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9744573Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9744931Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9745626Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9746149Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9746301Z FAILED [9.2340s] [100%]
2025-12-04T09:25:19.9746307Z 
2025-12-04T09:25:19.9746455Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9746784Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T09:25:19.9746911Z Traceback (most recent call last):
2025-12-04T09:25:19.9747458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9747576Z     self._join_processes(fn)
2025-12-04T09:25:19.9748158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9748299Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9748992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9749099Z     raise RuntimeError(error)
2025-12-04T09:25:19.9749315Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9749434Z Traceback (most recent call last):
2025-12-04T09:25:19.9749939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9750045Z     getattr(self, test_name)()
2025-12-04T09:25:19.9750544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9750626Z     fn()
2025-12-04T09:25:19.9751104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9751204Z     method(*args, **kwargs)
2025-12-04T09:25:19.9751728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9751839Z     method(*args, **kwargs)
2025-12-04T09:25:19.9752311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9752407Z     with policy():
2025-12-04T09:25:19.9752885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9752984Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9754127Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272.
2025-12-04T09:25:19.9754135Z 
2025-12-04T09:25:19.9754334Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9755060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9755067Z 
2025-12-04T09:25:19.9755416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9755421Z 
2025-12-04T09:25:19.9755425Z 
2025-12-04T09:25:19.9755623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9755852Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9756676Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml -
2025-12-04T09:25:19.9756861Z =========================== short test summary info ============================
2025-12-04T09:25:19.9757687Z FAILED [9.2340s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:25:19.9757823Z Traceback (most recent call last):
2025-12-04T09:25:19.9758308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9758406Z     getattr(self, test_name)()
2025-12-04T09:25:19.9758886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9758964Z     fn()
2025-12-04T09:25:19.9759407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9759507Z     method(*args, **kwargs)
2025-12-04T09:25:19.9759952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9760057Z     method(*args, **kwargs)
2025-12-04T09:25:19.9760500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9760582Z     with policy():
2025-12-04T09:25:19.9761037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9761131Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9762205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272.
2025-12-04T09:25:19.9762213Z 
2025-12-04T09:25:19.9762399Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9763182Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9763189Z 
2025-12-04T09:25:19.9763429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9763584Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9763745Z ======================= 1 failed, 7 deselected in 9.26s ========================
2025-12-04T09:25:19.9763833Z Got exit code 1
2025-12-04T09:25:19.9763924Z Retrying single test...
2025-12-04T09:25:19.9764599Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml
2025-12-04T09:25:19.9764742Z ============================= test session starts ==============================
2025-12-04T09:25:19.9765048Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9765150Z cachedir: .pytest_cache
2025-12-04T09:25:19.9765605Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9765718Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9765813Z configfile: pytest.ini
2025-12-04T09:25:19.9766284Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9766472Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T09:25:19.9767220Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9767350Z Running 1 items in this shard
2025-12-04T09:25:19.9767355Z 
2025-12-04T09:25:19.9768356Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:25:05.174000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36841
2025-12-04T09:25:19.9768823Z I1204 09:25:05.174000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36842
2025-12-04T09:25:19.9769265Z I1204 09:25:05.175000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36843
2025-12-04T09:25:19.9769700Z I1204 09:25:05.176000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36844
2025-12-04T09:25:19.9773098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9773450Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9777156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9777557Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9781359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9781752Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9785539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T09:25:19.9785986Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:25:19.9788589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9788858Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9791185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9791430Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9793689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9793933Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9796160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9796404Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9798620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9798774Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9800985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9801197Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9803417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9803566Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9805796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T09:25:19.9805942Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T09:25:19.9806323Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9806768Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9807649Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9808119Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9808984Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9809308Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9810137Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9810553Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9811383Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9811795Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9812618Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9812995Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9813817Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9814267Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9815745Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 617545728.
2025-12-04T09:25:19.9816040Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9816840Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9818030Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9818375Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9819060Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9819574Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:25:19.9820002Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9820501Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9821664Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9822247Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9823227Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9823596Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9824528Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9824995Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9825930Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9826388Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9827318Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9827733Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9828710Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9829172Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9830844Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272.
2025-12-04T09:25:19.9831175Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9831808Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9833096Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9833415Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9834058Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9834541Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:25:19.9834942Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9835410Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9836386Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9836838Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9837738Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9838090Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9838963Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9839396Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9840273Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9840707Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9841579Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9841967Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9842879Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9843312Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9844872Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176.
2025-12-04T09:25:19.9845183Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9845778Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9846894Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9847205Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9847858Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9848339Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:25:19.9848734Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:25:19.9849204Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:25:19.9850176Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9850635Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:25:19.9851534Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9851885Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:25:19.9852757Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9853196Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9854169Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9854572Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:25:19.9855399Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9855767Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:25:19.9856895Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9857399Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:25:19.9859034Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 456065024 and is now 611254272.
2025-12-04T09:25:19.9859370Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9860004Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9861202Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9861536Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:25:19.9862235Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9862749Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:25:19.9862855Z FAILED [9.1029s] [100%]
2025-12-04T09:25:19.9862863Z 
2025-12-04T09:25:19.9863010Z =================================== FAILURES ===================================
2025-12-04T09:25:19.9863341Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T09:25:19.9863466Z Traceback (most recent call last):
2025-12-04T09:25:19.9864065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:25:19.9864178Z     self._join_processes(fn)
2025-12-04T09:25:19.9864767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:25:19.9864903Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:25:19.9865514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:25:19.9865625Z     raise RuntimeError(error)
2025-12-04T09:25:19.9865859Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.9865984Z Traceback (most recent call last):
2025-12-04T09:25:19.9866521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9866631Z     getattr(self, test_name)()
2025-12-04T09:25:19.9867176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9867265Z     fn()
2025-12-04T09:25:19.9867776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9867879Z     method(*args, **kwargs)
2025-12-04T09:25:19.9868380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9868596Z     method(*args, **kwargs)
2025-12-04T09:25:19.9869183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9869307Z     with policy():
2025-12-04T09:25:19.9869781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9869886Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9871063Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176.
2025-12-04T09:25:19.9871069Z 
2025-12-04T09:25:19.9871271Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9871995Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9872003Z 
2025-12-04T09:25:19.9872247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9872252Z 
2025-12-04T09:25:19.9872256Z 
2025-12-04T09:25:19.9872459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:25:19.9872716Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:25:19.9873590Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml -
2025-12-04T09:25:19.9873753Z =========================== short test summary info ============================
2025-12-04T09:25:19.9874616Z FAILED [9.1029s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:25:19.9874729Z Traceback (most recent call last):
2025-12-04T09:25:19.9875252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:25:19.9875353Z     getattr(self, test_name)()
2025-12-04T09:25:19.9875907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:25:19.9875990Z     fn()
2025-12-04T09:25:19.9876465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9876565Z     method(*args, **kwargs)
2025-12-04T09:25:19.9877036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:25:19.9877133Z     method(*args, **kwargs)
2025-12-04T09:25:19.9877606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:25:19.9877692Z     with policy():
2025-12-04T09:25:19.9878170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:25:19.9878269Z     raise RuntimeError(msg)
2025-12-04T09:25:19.9879485Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176.
2025-12-04T09:25:19.9879499Z 
2025-12-04T09:25:19.9879686Z To execute this test, run the following from the base repo dir:
2025-12-04T09:25:19.9880358Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9880363Z 
2025-12-04T09:25:19.9880602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:25:19.9880757Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:25:19.9880939Z ======================= 1 failed, 7 deselected in 9.12s ========================
2025-12-04T09:25:19.9881028Z Got exit code 1
2025-12-04T09:25:19.9881647Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T09:25:19.9882043Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:25:19.9882705Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml
2025-12-04T09:25:19.9882847Z ============================= test session starts ==============================
2025-12-04T09:25:19.9883161Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:25:19.9883253Z cachedir: .pytest_cache
2025-12-04T09:25:19.9883720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:25:19.9883825Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:25:19.9883919Z configfile: pytest.ini
2025-12-04T09:25:19.9884404Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:25:19.9884587Z collecting ... collected 8 items / 8 deselected / 0 selected
2025-12-04T09:25:19.9884711Z stepcurrent: skipping 8 already run items.
2025-12-04T09:25:19.9884817Z Running 0 items in this shard
2025-12-04T09:25:19.9884821Z 
2025-12-04T09:25:19.9885646Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml -
2025-12-04T09:25:19.9885794Z ============================ 8 deselected in 0.01s =============================
2025-12-04T09:25:19.9891735Z The following tests failed consistently: ['test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda']
2025-12-04T09:25:19.9891758Z 
2025-12-04T09:25:19.9892411Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log)
2025-12-04T09:25:19.9892415Z 
2025-12-04T09:25:19.9892827Z Finished distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:25:19.633142][1951.241055853], took 5.73min
2025-12-04T09:25:19.9893708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml
2025-12-04T09:25:19.9894585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml
2025-12-04T09:25:19.9895497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml
2025-12-04T09:25:19.9896497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml
2025-12-04T09:25:19.9897651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml
2025-12-04T09:25:19.9898632Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml
2025-12-04T09:25:19.9899614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml
2025-12-04T09:25:19.9900599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml
2025-12-04T09:25:19.9901596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml
2025-12-04T09:25:19.9902571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml
2025-12-04T09:25:19.9903555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml
2025-12-04T09:25:19.9904616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml
2025-12-04T09:25:20.0059734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml
2025-12-04T09:25:20.0335728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml
2025-12-04T09:25:20.0875973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml
2025-12-04T09:25:20.1203468Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml
2025-12-04T09:25:20.1459818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml
2025-12-04T09:25:20.1792355Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml
2025-12-04T09:25:20.2053467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml
2025-12-04T09:25:20.2292600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml
2025-12-04T09:25:20.2590524Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml
2025-12-04T09:25:20.2843040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml
2025-12-04T09:25:20.3161955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml
2025-12-04T09:25:20.3415749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml
2025-12-04T09:25:20.3633753Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml
2025-12-04T09:25:20.6244043Z Uploading logs for 57116084904 to S3
2025-12-04T09:25:20.6739000Z Uploading artifacts took 0.29 seconds
2025-12-04T09:25:20.6739514Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed!
2025-12-04T09:25:20.6747954Z Running distributed/fsdp/test_fsdp_clip_grad_norm 1/1 ... [2025-12-04 09:25:20.674290][1952.282206742]
2025-12-04T09:25:20.6748621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:25:20.6749920Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_clip_grad_norm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:25:20.674671]
2025-12-04T09:28:44.9621248Z 
2025-12-04T09:28:44.9622618Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm 1/1 (test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log)
2025-12-04T09:28:44.9624232Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml
2025-12-04T09:28:44.9625272Z ============================= test session starts ==============================
2025-12-04T09:28:44.9625938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:44.9626670Z cachedir: .pytest_cache
2025-12-04T09:28:44.9627363Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:44.9628118Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:44.9628464Z configfile: pytest.ini
2025-12-04T09:28:44.9629172Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:44.9630069Z collecting ... collected 4 items
2025-12-04T09:28:44.9630472Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:28:44.9632716Z Running 4 items in this shard: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda
2025-12-04T09:28:44.9634750Z 
2025-12-04T09:28:44.9635663Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:25:24.094000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37291
2025-12-04T09:28:44.9637323Z I1204 09:25:24.095000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37292
2025-12-04T09:28:44.9638450Z I1204 09:25:24.096000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37293
2025-12-04T09:28:44.9639648Z I1204 09:25:24.097000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37294
2025-12-04T09:28:44.9641470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9642953Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9644413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9645880Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9647326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9648765Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9650201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9651659Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9653628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9655602Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9657854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9659880Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9661900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9663913Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9665942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9667994Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9669377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:44.9670591Z   return func(*args, **kwargs)
2025-12-04T09:28:44.9671776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9672961Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9674105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9675285Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9676446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9677628Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9678777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9679953Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9681137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9682338Z   fsdp_model = FSDP(
2025-12-04T09:28:44.9683458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9684653Z   fsdp_model = FSDP(
2025-12-04T09:28:44.9685856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9687054Z   fsdp_model = FSDP(
2025-12-04T09:28:44.9688169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9689359Z   fsdp_model = FSDP(
2025-12-04T09:28:44.9693917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:44.9699130Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:44.9704182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:44.9709347Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:44.9714250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:44.9719084Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:44.9724519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:44.9729526Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:44.9731026Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9732250Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:44.9733568Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9734761Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:44.9735954Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9737515Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:44.9738748Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9740025Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:44.9740762Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:44.9741898Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:44.9743588Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9745256Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:44.9746909Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9748545Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:44.9750196Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9751699Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9753203Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9754758Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9756258Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9757729Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:44.9759345Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9761043Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:44.9763507Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256.
2025-12-04T09:28:44.9765646Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9766769Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9768536Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9770047Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9771244Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9772629Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:44.9773821Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:44.9774880Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:44.9776552Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9778390Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:44.9780024Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9781565Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:44.9783072Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9784679Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9786326Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9787927Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9789668Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9791056Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:44.9792437Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9793853Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:44.9795782Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352.
2025-12-04T09:28:44.9797579Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9798619Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9800260Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9801607Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9802730Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9803981Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:44.9804998Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:44.9806006Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:44.9807501Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9808966Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:44.9810433Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9811792Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:44.9813130Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9814620Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9816043Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9817873Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9819469Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9821198Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:44.9822770Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9824375Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:44.9826545Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:44.9828580Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9829807Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9831646Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9833404Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9834560Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9836068Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:44.9837172Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:44.9838267Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:44.9839902Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9841500Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:44.9843081Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9844575Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:44.9846130Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9847788Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9849301Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9850791Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:44.9852292Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9853764Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:44.9855240Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9857193Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:44.9859341Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352.
2025-12-04T09:28:44.9862917Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9864102Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9865964Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9867486Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:44.9868707Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9870158Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:44.9870903Z dist init r=0, world=4
2025-12-04T09:28:44.9871173Z dist init r=1, world=4
2025-12-04T09:28:44.9871426Z dist init r=3, world=4
2025-12-04T09:28:44.9871692Z dist init r=2, world=4
2025-12-04T09:28:44.9872960Z [rank0]:[W1204 09:25:43.138329303 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:44.9874268Z FAILED [21.5235s] [ 25%]
2025-12-04T09:28:44.9874456Z 
2025-12-04T09:28:44.9874603Z =================================== FAILURES ===================================
2025-12-04T09:28:44.9875134Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________
2025-12-04T09:28:44.9875635Z Traceback (most recent call last):
2025-12-04T09:28:44.9876376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:44.9877130Z     self._join_processes(fn)
2025-12-04T09:28:44.9877957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:44.9878787Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:44.9879612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:44.9880427Z     raise RuntimeError(error)
2025-12-04T09:28:44.9880852Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:44.9881306Z Traceback (most recent call last):
2025-12-04T09:28:44.9882043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9882796Z     getattr(self, test_name)()
2025-12-04T09:28:44.9883509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9884227Z     fn()
2025-12-04T09:28:44.9884843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9885560Z     method(*args, **kwargs)
2025-12-04T09:28:44.9886227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9886945Z     method(*args, **kwargs)
2025-12-04T09:28:44.9887695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9888532Z     with policy():
2025-12-04T09:28:44.9889170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9889928Z     raise RuntimeError(msg)
2025-12-04T09:28:44.9891149Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256.
2025-12-04T09:28:44.9892305Z 
2025-12-04T09:28:44.9892521Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9893372Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9894038Z 
2025-12-04T09:28:44.9894288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9894677Z 
2025-12-04T09:28:44.9894835Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:44.9895233Z Traceback (most recent call last):
2025-12-04T09:28:44.9895966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9896988Z     getattr(self, test_name)()
2025-12-04T09:28:44.9897750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9898517Z     fn()
2025-12-04T09:28:44.9899169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9899939Z     method(*args, **kwargs)
2025-12-04T09:28:44.9900658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9901406Z     method(*args, **kwargs)
2025-12-04T09:28:44.9902116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9902870Z     with policy():
2025-12-04T09:28:44.9903537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9904307Z     raise RuntimeError(msg)
2025-12-04T09:28:44.9905653Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352.
2025-12-04T09:28:44.9906869Z 
2025-12-04T09:28:44.9907093Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9908021Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9908820Z 
2025-12-04T09:28:44.9909197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9909574Z 
2025-12-04T09:28:44.9909578Z 
2025-12-04T09:28:44.9909793Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:44.9910394Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:44.9911698Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml -
2025-12-04T09:28:44.9912773Z =========================== short test summary info ============================
2025-12-04T09:28:44.9913714Z FAILED [21.5235s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:44.9914601Z Traceback (most recent call last):
2025-12-04T09:28:44.9915313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9916031Z     getattr(self, test_name)()
2025-12-04T09:28:44.9916695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9917427Z     fn()
2025-12-04T09:28:44.9918012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9918686Z     method(*args, **kwargs)
2025-12-04T09:28:44.9919331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9920041Z     method(*args, **kwargs)
2025-12-04T09:28:44.9920688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9921692Z     with policy():
2025-12-04T09:28:44.9922382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9923154Z     raise RuntimeError(msg)
2025-12-04T09:28:44.9924429Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256.
2025-12-04T09:28:44.9925644Z 
2025-12-04T09:28:44.9925861Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9926785Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9927490Z 
2025-12-04T09:28:44.9927766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9928166Z 
2025-12-04T09:28:44.9928344Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:44.9928754Z Traceback (most recent call last):
2025-12-04T09:28:44.9929541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:44.9930347Z     getattr(self, test_name)()
2025-12-04T09:28:44.9931092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:44.9931870Z     fn()
2025-12-04T09:28:44.9932525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9933395Z     method(*args, **kwargs)
2025-12-04T09:28:44.9934262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:44.9934943Z     method(*args, **kwargs)
2025-12-04T09:28:44.9935579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:44.9936297Z     with policy():
2025-12-04T09:28:44.9937123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:44.9937890Z     raise RuntimeError(msg)
2025-12-04T09:28:44.9939176Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352.
2025-12-04T09:28:44.9940376Z 
2025-12-04T09:28:44.9940594Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:44.9941516Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:44.9942222Z 
2025-12-04T09:28:44.9942487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:44.9943081Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:44.9943548Z ============================== 1 failed in 21.55s ==============================
2025-12-04T09:28:44.9943948Z Got exit code 1
2025-12-04T09:28:44.9944218Z Retrying single test...
2025-12-04T09:28:44.9945171Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml
2025-12-04T09:28:44.9946173Z ============================= test session starts ==============================
2025-12-04T09:28:44.9946838Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:44.9947481Z cachedir: .pytest_cache
2025-12-04T09:28:44.9948179Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:44.9949052Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:44.9949366Z configfile: pytest.ini
2025-12-04T09:28:44.9950013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:44.9950791Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:44.9951682Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda
2025-12-04T09:28:44.9952476Z Running 1 items in this shard
2025-12-04T09:28:44.9952663Z 
2025-12-04T09:28:44.9953715Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:25:50.084000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37720
2025-12-04T09:28:44.9955193Z I1204 09:25:50.084000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37721
2025-12-04T09:28:44.9956432Z I1204 09:25:50.085000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37722
2025-12-04T09:28:44.9957538Z I1204 09:25:50.086000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37723
2025-12-04T09:28:44.9959375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9960876Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9962375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9963832Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9965274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9966735Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9968174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:44.9969627Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:44.9971650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9973554Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9975460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9977676Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9979727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9981752Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9983769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:44.9985795Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:44.9987107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:44.9988347Z   return func(*args, **kwargs)
2025-12-04T09:28:44.9989665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9990760Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9991832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9992910Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9994026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9995112Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9996167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9997231Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:44.9998322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:44.9999427Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0000486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0001570Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0002608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0003706Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0004743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0005850Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0009988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0014454Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0019449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0024793Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0029848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0034723Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0039195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0043705Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0045050Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0046138Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0047226Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0048321Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0049419Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0050497Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0051581Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0052668Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0053318Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0054318Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0055881Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0057685Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0059341Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0060878Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0062380Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0063994Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0065596Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0067190Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0068776Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0070299Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0071691Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0073155Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0075083Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256.
2025-12-04T09:28:45.0076893Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0077924Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0079543Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0080903Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0082007Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0083247Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.0084273Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0085288Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0086838Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0088312Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0089764Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0091126Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0092470Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0093887Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0095292Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0096955Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0098633Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0100223Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0101791Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0103419Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0105581Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:45.0107615Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0108894Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0110643Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0111984Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0113080Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0114331Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.0115348Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0116401Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0117894Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0119370Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0120959Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0122654Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0124160Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0125763Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0127369Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0128968Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0130639Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0132189Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0133881Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0135322Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0137572Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 3. CUDA driver allocated memory was 489619456 and is now 695140352.
2025-12-04T09:28:45.0139609Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0140770Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0142596Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0144135Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0145351Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0146761Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.0147987Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0149228Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0150821Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0152367Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0154004Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0155375Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0156724Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0158150Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0159560Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0161008Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0162440Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0163856Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0165243Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0166674Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0168617Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352.
2025-12-04T09:28:45.0170421Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0171468Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0173073Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0174423Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0175516Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0177105Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.0177906Z dist init r=3, world=4
2025-12-04T09:28:45.0178180Z dist init r=1, world=4
2025-12-04T09:28:45.0178461Z dist init r=2, world=4
2025-12-04T09:28:45.0178741Z dist init r=0, world=4
2025-12-04T09:28:45.0180068Z [rank0]:[W1204 09:26:09.128876599 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.0181461Z FAILED [21.5254s] [100%]
2025-12-04T09:28:45.0181660Z 
2025-12-04T09:28:45.0181813Z =================================== FAILURES ===================================
2025-12-04T09:28:45.0182375Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________
2025-12-04T09:28:45.0182896Z Traceback (most recent call last):
2025-12-04T09:28:45.0183692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.0184495Z     self._join_processes(fn)
2025-12-04T09:28:45.0185283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.0186157Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.0187043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.0187909Z     raise RuntimeError(error)
2025-12-04T09:28:45.0188394Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.0188990Z Traceback (most recent call last):
2025-12-04T09:28:45.0189688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0190404Z     getattr(self, test_name)()
2025-12-04T09:28:45.0191100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0191784Z     fn()
2025-12-04T09:28:45.0192366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0193031Z     method(*args, **kwargs)
2025-12-04T09:28:45.0193668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0194348Z     method(*args, **kwargs)
2025-12-04T09:28:45.0194985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0195640Z     with policy():
2025-12-04T09:28:45.0196251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0196938Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0198076Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352.
2025-12-04T09:28:45.0199154Z 
2025-12-04T09:28:45.0199350Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0200170Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0200791Z 
2025-12-04T09:28:45.0201043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0201400Z 
2025-12-04T09:28:45.0201559Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.0201923Z Traceback (most recent call last):
2025-12-04T09:28:45.0202677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0203396Z     getattr(self, test_name)()
2025-12-04T09:28:45.0204059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0204752Z     fn()
2025-12-04T09:28:45.0205331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0206008Z     method(*args, **kwargs)
2025-12-04T09:28:45.0206633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0207315Z     method(*args, **kwargs)
2025-12-04T09:28:45.0207950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0208606Z     with policy():
2025-12-04T09:28:45.0209218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0210356Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0211499Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:45.0212560Z 
2025-12-04T09:28:45.0212754Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0213569Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0214230Z 
2025-12-04T09:28:45.0214469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0214822Z 
2025-12-04T09:28:45.0214826Z 
2025-12-04T09:28:45.0215040Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.0215813Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.0217283Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml -
2025-12-04T09:28:45.0218488Z =========================== short test summary info ============================
2025-12-04T09:28:45.0219554Z FAILED [21.5254s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.0220552Z Traceback (most recent call last):
2025-12-04T09:28:45.0221538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0222342Z     getattr(self, test_name)()
2025-12-04T09:28:45.0223107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0223876Z     fn()
2025-12-04T09:28:45.0224528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0225287Z     method(*args, **kwargs)
2025-12-04T09:28:45.0226006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0226753Z     method(*args, **kwargs)
2025-12-04T09:28:45.0227465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0228219Z     with policy():
2025-12-04T09:28:45.0228889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0229658Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0231061Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352.
2025-12-04T09:28:45.0232271Z 
2025-12-04T09:28:45.0232610Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0233556Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0234193Z 
2025-12-04T09:28:45.0234431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0234804Z 
2025-12-04T09:28:45.0234951Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.0235341Z Traceback (most recent call last):
2025-12-04T09:28:45.0236038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0236752Z     getattr(self, test_name)()
2025-12-04T09:28:45.0237431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0238122Z     fn()
2025-12-04T09:28:45.0238690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0239373Z     method(*args, **kwargs)
2025-12-04T09:28:45.0240009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0240674Z     method(*args, **kwargs)
2025-12-04T09:28:45.0241312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0242026Z     with policy():
2025-12-04T09:28:45.0242636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0243308Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0244454Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:45.0245571Z 
2025-12-04T09:28:45.0245762Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0246575Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0247196Z 
2025-12-04T09:28:45.0247430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0247961Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.0248411Z ======================= 1 failed, 3 deselected in 21.55s =======================
2025-12-04T09:28:45.0248787Z Got exit code 1
2025-12-04T09:28:45.0249020Z Retrying single test...
2025-12-04T09:28:45.0249830Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml
2025-12-04T09:28:45.0250740Z ============================= test session starts ==============================
2025-12-04T09:28:45.0251313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.0251852Z cachedir: .pytest_cache
2025-12-04T09:28:45.0252483Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.0253179Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.0253486Z configfile: pytest.ini
2025-12-04T09:28:45.0254133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.0254924Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.0255850Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda
2025-12-04T09:28:45.0257081Z Running 1 items in this shard
2025-12-04T09:28:45.0257305Z 
2025-12-04T09:28:45.0258264Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:26:16.104000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38149
2025-12-04T09:28:45.0259845Z I1204 09:26:16.104000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38150
2025-12-04T09:28:45.0260977Z I1204 09:26:16.105000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38151
2025-12-04T09:28:45.0262096Z I1204 09:26:16.106000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38152
2025-12-04T09:28:45.0263979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:45.0265483Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:45.0266980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:45.0268467Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:45.0270032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:45.0271441Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:45.0272871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:28:45.0274281Z   self.encoder = TransformerEncoder(
2025-12-04T09:28:45.0276186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0277985Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0279781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0281568Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0283352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0285134Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0286970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0288757Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0289912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.0291024Z   return func(*args, **kwargs)
2025-12-04T09:28:45.0292078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0293171Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:45.0294237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0295319Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:45.0296437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0297814Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:45.0299014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0300272Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:28:45.0301497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0302762Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0303938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0305166Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0306324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0307559Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0308731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0309946Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0314144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0318575Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0323611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0328607Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0333728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0338676Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0343712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:28:45.0348815Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:28:45.0350260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0351364Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0352523Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0353605Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0354684Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0355781Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0356868Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0357962Z   fsdp_model.transformer.encoder = FSDP(
2025-12-04T09:28:45.0358615Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0359640Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0361139Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0362598Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0364067Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0365460Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0366809Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0368254Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0369671Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0371092Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0372512Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0373906Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0375288Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0376971Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0379208Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256.
2025-12-04T09:28:45.0381305Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0382485Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0384296Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0385828Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0387056Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0388465Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.0389740Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0390738Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0392229Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0393697Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0395188Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0396555Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0397911Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0399339Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0400957Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0402461Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0403951Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0405423Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0406896Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0408418Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0410528Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:45.0412446Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0413540Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0415245Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0416914Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0418151Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0419557Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.0420715Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0422027Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0423713Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0425438Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0427077Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0428661Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0430177Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0431775Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0433506Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0434926Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0436346Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0437722Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0439113Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0440529Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0442510Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352.
2025-12-04T09:28:45.0444322Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0445367Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0446986Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0448331Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0449616Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0450948Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.0452026Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0453095Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0454660Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0456304Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0458130Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0459667Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0461170Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0462773Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0464374Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0465975Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0467572Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0469222Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0470744Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0472471Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0474523Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352.
2025-12-04T09:28:45.0476326Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0477366Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0478995Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0480517Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0481672Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0482992Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.0483728Z dist init r=0, world=4
2025-12-04T09:28:45.0483998Z dist init r=1, world=4
2025-12-04T09:28:45.0484264Z dist init r=3, world=4
2025-12-04T09:28:45.0484514Z dist init r=2, world=4
2025-12-04T09:28:45.0485809Z [rank0]:[W1204 09:26:35.302210914 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.0487121Z FAILED [21.2721s] [100%]
2025-12-04T09:28:45.0487322Z 
2025-12-04T09:28:45.0487478Z =================================== FAILURES ===================================
2025-12-04T09:28:45.0503459Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________
2025-12-04T09:28:45.0504159Z Traceback (most recent call last):
2025-12-04T09:28:45.0505000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.0505814Z     self._join_processes(fn)
2025-12-04T09:28:45.0506617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.0507499Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.0508388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.0509450Z     raise RuntimeError(error)
2025-12-04T09:28:45.0509868Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.0510339Z Traceback (most recent call last):
2025-12-04T09:28:45.0511077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0511830Z     getattr(self, test_name)()
2025-12-04T09:28:45.0512531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0513367Z     fn()
2025-12-04T09:28:45.0513950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0514617Z     method(*args, **kwargs)
2025-12-04T09:28:45.0515259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0515936Z     method(*args, **kwargs)
2025-12-04T09:28:45.0516682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0517351Z     with policy():
2025-12-04T09:28:45.0517963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0518651Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0519785Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352.
2025-12-04T09:28:45.0521053Z 
2025-12-04T09:28:45.0521425Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0522358Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0523073Z 
2025-12-04T09:28:45.0523346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0523758Z 
2025-12-04T09:28:45.0523763Z 
2025-12-04T09:28:45.0523993Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.0524626Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.0525926Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml -
2025-12-04T09:28:45.0527116Z =========================== short test summary info ============================
2025-12-04T09:28:45.0528267Z FAILED [21.2721s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.0529266Z Traceback (most recent call last):
2025-12-04T09:28:45.0530067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0530905Z     getattr(self, test_name)()
2025-12-04T09:28:45.0531672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0532453Z     fn()
2025-12-04T09:28:45.0533090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0534035Z     method(*args, **kwargs)
2025-12-04T09:28:45.0534672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0535343Z     method(*args, **kwargs)
2025-12-04T09:28:45.0535965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0536886Z     with policy():
2025-12-04T09:28:45.0537580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0538342Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0539630Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352.
2025-12-04T09:28:45.0540845Z 
2025-12-04T09:28:45.0541064Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0541990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda
2025-12-04T09:28:45.0542692Z 
2025-12-04T09:28:45.0542967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0543543Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.0544133Z ======================= 1 failed, 3 deselected in 21.29s =======================
2025-12-04T09:28:45.0544563Z Got exit code 1
2025-12-04T09:28:45.0545200Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda
2025-12-04T09:28:45.0546230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:28:45.0547496Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml
2025-12-04T09:28:45.0548522Z ============================= test session starts ==============================
2025-12-04T09:28:45.0549347Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.0549882Z cachedir: .pytest_cache
2025-12-04T09:28:45.0550514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.0551218Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.0551525Z configfile: pytest.ini
2025-12-04T09:28:45.0552175Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.0552965Z collecting ... collected 4 items / 1 deselected / 3 selected
2025-12-04T09:28:45.0553387Z stepcurrent: skipping 1 already run items.
2025-12-04T09:28:45.0553735Z Running 3 items in this shard
2025-12-04T09:28:45.0553921Z 
2025-12-04T09:28:45.0554820Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:26:42.094000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38578
2025-12-04T09:28:45.0556297Z I1204 09:26:42.095000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38579
2025-12-04T09:28:45.0557304Z I1204 09:26:42.095000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38580
2025-12-04T09:28:45.0558351Z I1204 09:26:42.096000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38581
2025-12-04T09:28:45.0560455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0562262Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0564052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0565831Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0567625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0569411Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0571238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0573025Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0574185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.0575285Z   return func(*args, **kwargs)
2025-12-04T09:28:45.0576459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0577879Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0579151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0580401Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0581663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0582930Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0584180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0585464Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0586694Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0587939Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0589272Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0590386Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0591453Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0592662Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0593671Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0594721Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0595298Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0596317Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0597810Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0599273Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0600737Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0602154Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0603507Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0604931Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0606340Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0607765Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0609189Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0610571Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0611957Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0613368Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0615339Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:28:45.0617566Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0618751Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0620628Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0622366Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0623606Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0625022Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.0626175Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0627297Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0628984Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0630641Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0633031Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0634411Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0635747Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0637172Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0638601Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0640026Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0641433Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0642814Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0644199Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0645663Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0647620Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.0649476Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0650520Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0652183Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0653582Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0654685Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0655928Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.0657230Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0658359Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0660047Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0661691Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0663398Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0664945Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0666460Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0668052Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0669769Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0671191Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0672603Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0673978Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0675359Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0676806Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0678761Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:28:45.0680625Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0681679Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0683337Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0684730Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0685827Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0687083Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.0688102Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0689093Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0690639Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0692110Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0693576Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0694939Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0696328Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0698059Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0699669Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0701282Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0702883Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0704426Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0706032Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0707664Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0709875Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312.
2025-12-04T09:28:45.0711708Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0712735Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0714400Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0715795Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0716890Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0718138Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.0718834Z dist init r=3, world=4
2025-12-04T09:28:45.0719091Z dist init r=0, world=4
2025-12-04T09:28:45.0719344Z dist init r=2, world=4
2025-12-04T09:28:45.0719582Z dist init r=1, world=4
2025-12-04T09:28:45.0720974Z [rank0]:[W1204 09:26:49.480385136 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.0722523Z FAILED [9.7106s] [ 33%]
2025-12-04T09:28:45.0722700Z 
2025-12-04T09:28:45.0722865Z =================================== FAILURES ===================================
2025-12-04T09:28:45.0723428Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________
2025-12-04T09:28:45.0723971Z Traceback (most recent call last):
2025-12-04T09:28:45.0724765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.0725558Z     self._join_processes(fn)
2025-12-04T09:28:45.0726358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.0727231Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.0728127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.0728984Z     raise RuntimeError(error)
2025-12-04T09:28:45.0729436Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.0729934Z Traceback (most recent call last):
2025-12-04T09:28:45.0730718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0731505Z     getattr(self, test_name)()
2025-12-04T09:28:45.0732256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0733092Z     fn()
2025-12-04T09:28:45.0733814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0734494Z     method(*args, **kwargs)
2025-12-04T09:28:45.0735136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0735855Z     method(*args, **kwargs)
2025-12-04T09:28:45.0736541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0737443Z     with policy():
2025-12-04T09:28:45.0738125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0738881Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0740215Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312.
2025-12-04T09:28:45.0741481Z 
2025-12-04T09:28:45.0741701Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0742670Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0743418Z 
2025-12-04T09:28:45.0743697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0744101Z 
2025-12-04T09:28:45.0744106Z 
2025-12-04T09:28:45.0744329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.0744963Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.0746268Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml -
2025-12-04T09:28:45.0747474Z =========================== short test summary info ============================
2025-12-04T09:28:45.0748748Z FAILED [9.7106s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.0749818Z Traceback (most recent call last):
2025-12-04T09:28:45.0750526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0751249Z     getattr(self, test_name)()
2025-12-04T09:28:45.0751909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0752605Z     fn()
2025-12-04T09:28:45.0753187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0753864Z     method(*args, **kwargs)
2025-12-04T09:28:45.0754501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0755181Z     method(*args, **kwargs)
2025-12-04T09:28:45.0755819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0756481Z     with policy():
2025-12-04T09:28:45.0757090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0757774Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0758934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312.
2025-12-04T09:28:45.0760046Z 
2025-12-04T09:28:45.0760267Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0761124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0761791Z 
2025-12-04T09:28:45.0762045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0762599Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.0763035Z ======================= 1 failed, 1 deselected in 9.73s ========================
2025-12-04T09:28:45.0763412Z Got exit code 1
2025-12-04T09:28:45.0763654Z Retrying single test...
2025-12-04T09:28:45.0764446Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml
2025-12-04T09:28:45.0765356Z ============================= test session starts ==============================
2025-12-04T09:28:45.0765946Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.0766480Z cachedir: .pytest_cache
2025-12-04T09:28:45.0767094Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.0767797Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.0768115Z configfile: pytest.ini
2025-12-04T09:28:45.0768751Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.0769544Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.0770467Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda
2025-12-04T09:28:45.0771287Z Running 1 items in this shard
2025-12-04T09:28:45.0771482Z 
2025-12-04T09:28:45.0772359Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:26:56.524000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38915
2025-12-04T09:28:45.0773842Z I1204 09:26:56.524000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38916
2025-12-04T09:28:45.0774858Z I1204 09:26:56.525000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38917
2025-12-04T09:28:45.0775869Z I1204 09:26:56.526000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38918
2025-12-04T09:28:45.0778383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0780421Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0782444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0784455Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0786460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0788501Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0790521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.0792331Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.0793485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.0794598Z   return func(*args, **kwargs)
2025-12-04T09:28:45.0795670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0796782Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0797909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0799029Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0800141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0801249Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0802362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0803484Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.0804618Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0805682Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0806681Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0807727Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0808735Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0809776Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0810972Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.0812084Z   fsdp_model = FSDP(
2025-12-04T09:28:45.0812665Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0813718Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0815284Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0817081Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0818756Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0820282Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0821994Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0823593Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0825170Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0826752Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0828328Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0829865Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0831414Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0833106Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0835337Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.0837321Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0838443Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0840239Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0841750Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0842930Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0844273Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.0845367Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0846450Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0848066Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0849684Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0851347Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0852806Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0854211Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0855693Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0857439Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0859021Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0860607Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0862145Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0863689Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0865274Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0867505Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312.
2025-12-04T09:28:45.0869620Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0870707Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0872446Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0873911Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0875054Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0876370Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.0877425Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0878462Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0880055Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0881598Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0883149Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0884573Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0885961Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0887450Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0888939Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0890423Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0891923Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0893378Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0894843Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0896834Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0899155Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:28:45.0901206Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0902373Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0904242Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0905798Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0907015Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0908418Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.0909604Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.0910990Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.0912756Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0914363Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.0915944Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0917418Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.0918875Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0920421Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0922329Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0923919Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.0925510Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0927066Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.0928708Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0930299Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.0932487Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.0934740Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0935826Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0937861Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0939433Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.0940656Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0942057Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.0942888Z dist init r=1, world=4
2025-12-04T09:28:45.0943158Z dist init r=0, world=4
2025-12-04T09:28:45.0943428Z dist init r=2, world=4
2025-12-04T09:28:45.0943696Z dist init r=3, world=4
2025-12-04T09:28:45.0945011Z [rank0]:[W1204 09:27:04.819543431 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.0946444Z FAILED [8.9971s] [100%]
2025-12-04T09:28:45.0946626Z 
2025-12-04T09:28:45.0946772Z =================================== FAILURES ===================================
2025-12-04T09:28:45.0947335Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________
2025-12-04T09:28:45.0947861Z Traceback (most recent call last):
2025-12-04T09:28:45.0948729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.0949433Z     self._join_processes(fn)
2025-12-04T09:28:45.0950125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.0950890Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.0951660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.0952417Z     raise RuntimeError(error)
2025-12-04T09:28:45.0952802Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.0953234Z Traceback (most recent call last):
2025-12-04T09:28:45.0953922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0954621Z     getattr(self, test_name)()
2025-12-04T09:28:45.0955273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0955955Z     fn()
2025-12-04T09:28:45.0956521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0957221Z     method(*args, **kwargs)
2025-12-04T09:28:45.0957855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0958522Z     method(*args, **kwargs)
2025-12-04T09:28:45.0959147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0959791Z     with policy():
2025-12-04T09:28:45.0960389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0961065Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0962222Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.0963333Z 
2025-12-04T09:28:45.0963525Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0964374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0965030Z 
2025-12-04T09:28:45.0965270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0965620Z 
2025-12-04T09:28:45.0965625Z 
2025-12-04T09:28:45.0965830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.0966377Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.0967533Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml -
2025-12-04T09:28:45.0968630Z =========================== short test summary info ============================
2025-12-04T09:28:45.0969607Z FAILED [8.9971s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.0970540Z Traceback (most recent call last):
2025-12-04T09:28:45.0971228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.0971917Z     getattr(self, test_name)()
2025-12-04T09:28:45.0972570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.0973245Z     fn()
2025-12-04T09:28:45.0973819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0974487Z     method(*args, **kwargs)
2025-12-04T09:28:45.0975103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.0975773Z     method(*args, **kwargs)
2025-12-04T09:28:45.0976480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.0977364Z     with policy():
2025-12-04T09:28:45.0978024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.0978780Z     raise RuntimeError(msg)
2025-12-04T09:28:45.0980081Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.0981317Z 
2025-12-04T09:28:45.0981533Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.0982473Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.0983282Z 
2025-12-04T09:28:45.0983550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.0984123Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.0984608Z ======================= 1 failed, 3 deselected in 9.02s ========================
2025-12-04T09:28:45.0985010Z Got exit code 1
2025-12-04T09:28:45.0985268Z Retrying single test...
2025-12-04T09:28:45.0986162Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml
2025-12-04T09:28:45.0987154Z ============================= test session starts ==============================
2025-12-04T09:28:45.0987812Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.0988401Z cachedir: .pytest_cache
2025-12-04T09:28:45.0989285Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.0989965Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.0990267Z configfile: pytest.ini
2025-12-04T09:28:45.0990898Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.0991666Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.0992573Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda
2025-12-04T09:28:45.0993393Z Running 1 items in this shard
2025-12-04T09:28:45.0993603Z 
2025-12-04T09:28:45.0994495Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:27:10.373000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39252
2025-12-04T09:28:45.0995919Z I1204 09:27:10.374000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39253
2025-12-04T09:28:45.0996951Z I1204 09:27:10.375000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39254
2025-12-04T09:28:45.0997952Z I1204 09:27:10.376000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39255
2025-12-04T09:28:45.1000037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1001832Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1003604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1005393Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1007166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1008943Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1010816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1012590Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1013723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1014821Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1015889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1017285Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.1018521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1019770Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.1021191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1022442Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.1023750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1025008Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:28:45.1026220Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1027449Z   fsdp_model = FSDP(
2025-12-04T09:28:45.1028556Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1029739Z   fsdp_model = FSDP(
2025-12-04T09:28:45.1030863Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1032047Z   fsdp_model = FSDP(
2025-12-04T09:28:45.1033229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:28:45.1034278Z   fsdp_model = FSDP(
2025-12-04T09:28:45.1034834Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1035828Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1037299Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1038756Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1040270Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1041622Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1042953Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1044346Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1045758Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1047173Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1048588Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1049969Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1051338Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1052791Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1054753Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:28:45.1056832Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1057995Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1059842Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1061406Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1062630Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1064038Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1065169Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1066288Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1067958Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1069629Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1071131Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1072486Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1073818Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1075226Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1076637Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1078050Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1079450Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1080820Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1082194Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1083632Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1085577Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 602865664 and is now 653197312.
2025-12-04T09:28:45.1087424Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1088453Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1090101Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1091490Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1092576Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1093810Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1094814Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1095808Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1097670Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1099310Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1100949Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1102466Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1103966Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1105553Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1107138Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1108722Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1110245Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1111790Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1113281Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1114799Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1116864Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:28:45.1118797Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1120061Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1122191Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1123744Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1124967Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1126367Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1127501Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1128618Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1130386Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1132039Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1133811Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1135377Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1137033Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1138642Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1140235Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1141820Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1143397Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1144979Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1146530Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1148163Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1150434Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 516882432 and is now 653197312.
2025-12-04T09:28:45.1152258Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1153288Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1154939Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1156318Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1157399Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1158647Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1159330Z dist init r=0, world=4
2025-12-04T09:28:45.1159573Z dist init r=3, world=4
2025-12-04T09:28:45.1159816Z dist init r=1, world=4
2025-12-04T09:28:45.1160110Z dist init r=2, world=4
2025-12-04T09:28:45.1161287Z [rank0]:[W1204 09:27:18.722161313 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.1162507Z FAILED [9.1019s] [100%]
2025-12-04T09:28:45.1162658Z 
2025-12-04T09:28:45.1162796Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1163290Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________
2025-12-04T09:28:45.1163763Z Traceback (most recent call last):
2025-12-04T09:28:45.1164460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1165155Z     self._join_processes(fn)
2025-12-04T09:28:45.1165866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1166635Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1167415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1168161Z     raise RuntimeError(error)
2025-12-04T09:28:45.1168555Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1168985Z Traceback (most recent call last):
2025-12-04T09:28:45.1169677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1170399Z     getattr(self, test_name)()
2025-12-04T09:28:45.1171058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1171746Z     fn()
2025-12-04T09:28:45.1172310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1173004Z     method(*args, **kwargs)
2025-12-04T09:28:45.1173632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1174300Z     method(*args, **kwargs)
2025-12-04T09:28:45.1174917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1175567Z     with policy():
2025-12-04T09:28:45.1176156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1177106Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1178479Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:28:45.1179734Z 
2025-12-04T09:28:45.1179946Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1180891Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1181635Z 
2025-12-04T09:28:45.1181907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1182308Z 
2025-12-04T09:28:45.1182313Z 
2025-12-04T09:28:45.1182532Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1183145Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1184445Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml -
2025-12-04T09:28:45.1185700Z =========================== short test summary info ============================
2025-12-04T09:28:45.1186792Z FAILED [9.1019s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1187820Z Traceback (most recent call last):
2025-12-04T09:28:45.1188603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1189444Z     getattr(self, test_name)()
2025-12-04T09:28:45.1190098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1190780Z     fn()
2025-12-04T09:28:45.1191353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1192010Z     method(*args, **kwargs)
2025-12-04T09:28:45.1192642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1193310Z     method(*args, **kwargs)
2025-12-04T09:28:45.1193931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1194578Z     with policy():
2025-12-04T09:28:45.1195173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1195843Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1196994Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:28:45.1198122Z 
2025-12-04T09:28:45.1198311Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1199159Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda
2025-12-04T09:28:45.1199843Z 
2025-12-04T09:28:45.1200082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1200597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1201029Z ======================= 1 failed, 3 deselected in 9.12s ========================
2025-12-04T09:28:45.1201391Z Got exit code 1
2025-12-04T09:28:45.1201999Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda
2025-12-04T09:28:45.1202937Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:28:45.1204054Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml
2025-12-04T09:28:45.1204963Z ============================= test session starts ==============================
2025-12-04T09:28:45.1205543Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1206061Z cachedir: .pytest_cache
2025-12-04T09:28:45.1206683Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1207367Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1207665Z configfile: pytest.ini
2025-12-04T09:28:45.1208302Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1209083Z collecting ... collected 4 items / 2 deselected / 2 selected
2025-12-04T09:28:45.1209505Z stepcurrent: skipping 2 already run items.
2025-12-04T09:28:45.1209835Z Running 2 items in this shard
2025-12-04T09:28:45.1210028Z 
2025-12-04T09:28:45.1210932Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:24.383000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39589
2025-12-04T09:28:45.1212333Z I1204 09:27:24.384000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39590
2025-12-04T09:28:45.1213341Z I1204 09:27:24.385000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39591
2025-12-04T09:28:45.1214332Z I1204 09:27:24.386000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39592
2025-12-04T09:28:45.1216486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1218660Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1220671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1222848Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1224855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1226952Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1228959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1230953Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1232243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1233528Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1235233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1237222Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1239103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1240993Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1242942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1243100Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1244710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1244862Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1245801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1245918Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1246353Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1246870Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1247816Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1248325Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1249264Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1249770Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1250634Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1251064Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1251927Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1252360Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1253211Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1253613Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1254463Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1254908Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1256395Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088.
2025-12-04T09:28:45.1256910Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1257566Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1258604Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1258976Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1259694Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1260248Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1260698Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1261232Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1262261Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1262769Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1263794Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1264189Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1265151Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1265640Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1266610Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1267095Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1268054Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1268507Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1269511Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1270005Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1271362Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184.
2025-12-04T09:28:45.1271691Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1272273Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1273185Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1273518Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1274151Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1274645Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1275045Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1275579Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1276469Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1276942Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1277831Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1278183Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1279051Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1279488Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1280353Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1280787Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1281639Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1282046Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1282949Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1283396Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1284747Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 477036544 and is now 623837184.
2025-12-04T09:28:45.1285079Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1285666Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1286576Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1286906Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1287538Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1288028Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1288454Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1288937Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1289847Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1290298Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1291183Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1291536Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1292399Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1292832Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1293688Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1294117Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1294968Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1295423Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1296335Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1296964Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1298482Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:28:45.1298859Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1299517Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1300539Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1300909Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1301622Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1302215Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1302316Z dist init r=3, world=4
2025-12-04T09:28:45.1302414Z dist init r=0, world=4
2025-12-04T09:28:45.1302527Z dist init r=1, world=4
2025-12-04T09:28:45.1302662Z dist init r=2, world=4
2025-12-04T09:28:45.1302757Z FAILED [8.6116s] [ 50%]
2025-12-04T09:28:45.1302775Z 
2025-12-04T09:28:45.1302922Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1303188Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________
2025-12-04T09:28:45.1303323Z Traceback (most recent call last):
2025-12-04T09:28:45.1303868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1303977Z     self._join_processes(fn)
2025-12-04T09:28:45.1304570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1304713Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1305329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1305444Z     raise RuntimeError(error)
2025-12-04T09:28:45.1305678Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1305805Z Traceback (most recent call last):
2025-12-04T09:28:45.1306343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1306455Z     getattr(self, test_name)()
2025-12-04T09:28:45.1306998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1307091Z     fn()
2025-12-04T09:28:45.1307610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1307712Z     method(*args, **kwargs)
2025-12-04T09:28:45.1308778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1308995Z     method(*args, **kwargs)
2025-12-04T09:28:45.1309444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1309531Z     with policy():
2025-12-04T09:28:45.1309991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1310089Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1311046Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088.
2025-12-04T09:28:45.1311055Z 
2025-12-04T09:28:45.1311247Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1311762Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1311778Z 
2025-12-04T09:28:45.1312011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1312016Z 
2025-12-04T09:28:45.1312020Z 
2025-12-04T09:28:45.1312216Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1312459Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1313251Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml -
2025-12-04T09:28:45.1313438Z =========================== short test summary info ============================
2025-12-04T09:28:45.1314100Z FAILED [8.6116s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1314208Z Traceback (most recent call last):
2025-12-04T09:28:45.1314733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1314828Z     getattr(self, test_name)()
2025-12-04T09:28:45.1315301Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1315386Z     fn()
2025-12-04T09:28:45.1315843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1315945Z     method(*args, **kwargs)
2025-12-04T09:28:45.1316393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1316490Z     method(*args, **kwargs)
2025-12-04T09:28:45.1316946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1317040Z     with policy():
2025-12-04T09:28:45.1317501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1317600Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1318552Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088.
2025-12-04T09:28:45.1318557Z 
2025-12-04T09:28:45.1318756Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1319267Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1319273Z 
2025-12-04T09:28:45.1319516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1319729Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1319892Z ======================= 1 failed, 2 deselected in 8.63s ========================
2025-12-04T09:28:45.1319986Z Got exit code 1
2025-12-04T09:28:45.1320081Z Retrying single test...
2025-12-04T09:28:45.1320711Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml
2025-12-04T09:28:45.1321000Z ============================= test session starts ==============================
2025-12-04T09:28:45.1321495Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1321615Z cachedir: .pytest_cache
2025-12-04T09:28:45.1322130Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1322250Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1322365Z configfile: pytest.ini
2025-12-04T09:28:45.1322906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1323111Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.1323784Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda
2025-12-04T09:28:45.1323898Z Running 1 items in this shard
2025-12-04T09:28:45.1323903Z 
2025-12-04T09:28:45.1324865Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:37.803000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39902
2025-12-04T09:28:45.1325426Z I1204 09:27:37.804000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39903
2025-12-04T09:28:45.1325941Z I1204 09:27:37.805000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39904
2025-12-04T09:28:45.1326488Z I1204 09:27:37.806000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39905
2025-12-04T09:28:45.1328223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1328405Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1330124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1330302Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1332020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1332196Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1334058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1334227Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1335165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1335273Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1337147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1337316Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1339044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1339207Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1340917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1341114Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1342839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1343085Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1344081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1344206Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1344666Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1345219Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1346225Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1346736Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1347741Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1348137Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1349242Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1349709Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1350623Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1351082Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1351993Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1352421Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1353425Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1353871Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1355220Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:28:45.1355604Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1356212Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1357130Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1357450Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1358089Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1358585Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1358990Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1359478Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1360367Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1360816Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1361701Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1362099Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1362962Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1363397Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1364257Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1364693Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1365551Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1365959Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1366817Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1367263Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1368637Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088.
2025-12-04T09:28:45.1368993Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1369578Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1370485Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1370820Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1371456Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1371952Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1372352Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1372834Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1373721Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1374171Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1375105Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1375458Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1376382Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1377001Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1377979Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1378470Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1379436Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1379893Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1380857Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1381396Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1382932Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:28:45.1383336Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1383995Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1385016Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1385392Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1386112Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1386669Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1387118Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1387647Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1388658Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1389291Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1390180Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1390535Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1391396Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1391831Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1392687Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1393130Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1393978Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1394383Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1395234Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1395709Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1397081Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 581894144 and is now 623837184.
2025-12-04T09:28:45.1397412Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1397997Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1398907Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1399240Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1399879Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1400368Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1400460Z dist init r=0, world=4
2025-12-04T09:28:45.1400549Z dist init r=1, world=4
2025-12-04T09:28:45.1400643Z dist init r=2, world=4
2025-12-04T09:28:45.1400730Z dist init r=3, world=4
2025-12-04T09:28:45.1400819Z FAILED [9.0692s] [100%]
2025-12-04T09:28:45.1400824Z 
2025-12-04T09:28:45.1400966Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1401201Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________
2025-12-04T09:28:45.1401377Z Traceback (most recent call last):
2025-12-04T09:28:45.1401870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1401970Z     self._join_processes(fn)
2025-12-04T09:28:45.1402497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1402626Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1403172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1403274Z     raise RuntimeError(error)
2025-12-04T09:28:45.1403485Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1403604Z Traceback (most recent call last):
2025-12-04T09:28:45.1404083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1404187Z     getattr(self, test_name)()
2025-12-04T09:28:45.1404671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1404754Z     fn()
2025-12-04T09:28:45.1405217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1405312Z     method(*args, **kwargs)
2025-12-04T09:28:45.1405760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1405863Z     method(*args, **kwargs)
2025-12-04T09:28:45.1406338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1406428Z     with policy():
2025-12-04T09:28:45.1406889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1407012Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1407969Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:28:45.1407974Z 
2025-12-04T09:28:45.1408167Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1408676Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1408690Z 
2025-12-04T09:28:45.1408931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1408936Z 
2025-12-04T09:28:45.1409084Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.1409198Z Traceback (most recent call last):
2025-12-04T09:28:45.1409685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1409787Z     getattr(self, test_name)()
2025-12-04T09:28:45.1410274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1410355Z     fn()
2025-12-04T09:28:45.1410812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1410906Z     method(*args, **kwargs)
2025-12-04T09:28:45.1411353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1411458Z     method(*args, **kwargs)
2025-12-04T09:28:45.1411905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1411991Z     with policy():
2025-12-04T09:28:45.1412502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1412604Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1413560Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:28:45.1413565Z 
2025-12-04T09:28:45.1413918Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1414455Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1414472Z 
2025-12-04T09:28:45.1414723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1414728Z 
2025-12-04T09:28:45.1414732Z 
2025-12-04T09:28:45.1414947Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1415202Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1416041Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml -
2025-12-04T09:28:45.1416213Z =========================== short test summary info ============================
2025-12-04T09:28:45.1417171Z FAILED [9.0692s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1417294Z Traceback (most recent call last):
2025-12-04T09:28:45.1417895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1418007Z     getattr(self, test_name)()
2025-12-04T09:28:45.1418550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1418681Z     fn()
2025-12-04T09:28:45.1419192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1419305Z     method(*args, **kwargs)
2025-12-04T09:28:45.1419805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1419912Z     method(*args, **kwargs)
2025-12-04T09:28:45.1420423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1420522Z     with policy():
2025-12-04T09:28:45.1421227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1421340Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1422414Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:28:45.1422422Z 
2025-12-04T09:28:45.1422646Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1423221Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1423226Z 
2025-12-04T09:28:45.1423502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1423507Z 
2025-12-04T09:28:45.1423672Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.1423795Z Traceback (most recent call last):
2025-12-04T09:28:45.1424349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1424557Z     getattr(self, test_name)()
2025-12-04T09:28:45.1425110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1425200Z     fn()
2025-12-04T09:28:45.1425711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1425826Z     method(*args, **kwargs)
2025-12-04T09:28:45.1426330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1426435Z     method(*args, **kwargs)
2025-12-04T09:28:45.1426950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1427051Z     with policy():
2025-12-04T09:28:45.1427568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1427686Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1428755Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:28:45.1428760Z 
2025-12-04T09:28:45.1428983Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1429555Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1429560Z 
2025-12-04T09:28:45.1429829Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1430051Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1430230Z ======================= 1 failed, 3 deselected in 9.09s ========================
2025-12-04T09:28:45.1430337Z Got exit code 1
2025-12-04T09:28:45.1430447Z Retrying single test...
2025-12-04T09:28:45.1431203Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml
2025-12-04T09:28:45.1431377Z ============================= test session starts ==============================
2025-12-04T09:28:45.1431726Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1431840Z cachedir: .pytest_cache
2025-12-04T09:28:45.1432357Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1432591Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1432705Z configfile: pytest.ini
2025-12-04T09:28:45.1433228Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1433438Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.1434085Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda
2025-12-04T09:28:45.1434193Z Running 1 items in this shard
2025-12-04T09:28:45.1434198Z 
2025-12-04T09:28:45.1446545Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:51.374000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40215
2025-12-04T09:28:45.1447020Z I1204 09:27:51.375000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40216
2025-12-04T09:28:45.1447499Z I1204 09:27:51.376000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40217
2025-12-04T09:28:45.1447959Z I1204 09:27:51.377000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40218
2025-12-04T09:28:45.1449702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1449862Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1451463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1451624Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1453231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1453387Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1454980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1455169Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1456103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1456338Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1458205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1458366Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1460069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1460230Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1461933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1462093Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1463910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:28:45.1464073Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:28:45.1465066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:28:45.1465175Z   return func(*args, **kwargs)
2025-12-04T09:28:45.1465632Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1466169Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1467169Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1467681Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1468772Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1469243Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1470132Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1470563Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1471439Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1471865Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1472712Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1473106Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1473961Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1474397Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1475746Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184.
2025-12-04T09:28:45.1476074Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1476649Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1477609Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1477930Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1478564Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1479048Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1479446Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1479919Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1480801Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1481256Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1482132Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1482508Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1483364Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1483817Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1484669Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1485097Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1485949Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1486343Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1487198Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1487636Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1488984Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088.
2025-12-04T09:28:45.1489311Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1489937Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1490843Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1491162Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1491795Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1494509Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1494940Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1495411Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1496394Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1497038Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1498028Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1498504Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1499498Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1499985Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1500946Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1501434Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1502414Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1502864Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1503841Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1504332Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1505859Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:28:45.1506265Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1506927Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1507956Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1508315Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1509130Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1509663Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1510069Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1510547Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1511436Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1511892Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1512815Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1513194Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1514058Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1514489Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1515351Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1515784Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1516641Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1517041Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1517892Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1518339Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1519717Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 581894144 and is now 623837184.
2025-12-04T09:28:45.1520051Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1520633Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1521942Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1522310Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1523105Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1523652Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1523755Z dist init r=2, world=4
2025-12-04T09:28:45.1523861Z dist init r=0, world=4
2025-12-04T09:28:45.1523959Z dist init r=1, world=4
2025-12-04T09:28:45.1524056Z dist init r=3, world=4
2025-12-04T09:28:45.1524165Z FAILED [8.6485s] [100%]
2025-12-04T09:28:45.1524173Z 
2025-12-04T09:28:45.1524321Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1524599Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________
2025-12-04T09:28:45.1524758Z Traceback (most recent call last):
2025-12-04T09:28:45.1525310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1525432Z     self._join_processes(fn)
2025-12-04T09:28:45.1526055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1526203Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1526808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1526920Z     raise RuntimeError(error)
2025-12-04T09:28:45.1527155Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1527274Z Traceback (most recent call last):
2025-12-04T09:28:45.1527810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1527930Z     getattr(self, test_name)()
2025-12-04T09:28:45.1528469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1528568Z     fn()
2025-12-04T09:28:45.1529078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1529183Z     method(*args, **kwargs)
2025-12-04T09:28:45.1529696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1529801Z     method(*args, **kwargs)
2025-12-04T09:28:45.1530308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1530403Z     with policy():
2025-12-04T09:28:45.1530910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1531026Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1532140Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184.
2025-12-04T09:28:45.1532150Z 
2025-12-04T09:28:45.1532374Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1532948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1532954Z 
2025-12-04T09:28:45.1533217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1533223Z 
2025-12-04T09:28:45.1533228Z 
2025-12-04T09:28:45.1533460Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1533805Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1534635Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml -
2025-12-04T09:28:45.1534788Z =========================== short test summary info ============================
2025-12-04T09:28:45.1535456Z FAILED [8.6485s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1535739Z Traceback (most recent call last):
2025-12-04T09:28:45.1536330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1536446Z     getattr(self, test_name)()
2025-12-04T09:28:45.1537134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1537261Z     fn()
2025-12-04T09:28:45.1537779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1537885Z     method(*args, **kwargs)
2025-12-04T09:28:45.1538421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1538535Z     method(*args, **kwargs)
2025-12-04T09:28:45.1539036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1539137Z     with policy():
2025-12-04T09:28:45.1539644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1539753Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1540833Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184.
2025-12-04T09:28:45.1540845Z 
2025-12-04T09:28:45.1541059Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1541643Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda
2025-12-04T09:28:45.1541648Z 
2025-12-04T09:28:45.1541914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1542089Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1542280Z ======================= 1 failed, 3 deselected in 8.67s ========================
2025-12-04T09:28:45.1542374Z Got exit code 1
2025-12-04T09:28:45.1542881Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda
2025-12-04T09:28:45.1543290Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:28:45.1544035Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml
2025-12-04T09:28:45.1544209Z ============================= test session starts ==============================
2025-12-04T09:28:45.1544558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1544663Z cachedir: .pytest_cache
2025-12-04T09:28:45.1545186Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1545304Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1545416Z configfile: pytest.ini
2025-12-04T09:28:45.1545952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1546155Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.1546299Z stepcurrent: skipping 3 already run items.
2025-12-04T09:28:45.1546447Z Running 1 items in this shard
2025-12-04T09:28:45.1546453Z 
2025-12-04T09:28:45.1547403Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:04.754000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40528
2025-12-04T09:28:45.1547907Z I1204 09:28:04.755000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40529
2025-12-04T09:28:45.1548393Z I1204 09:28:04.756000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40530
2025-12-04T09:28:45.1549094Z I1204 09:28:04.757000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40531
2025-12-04T09:28:45.1549806Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1550291Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1551183Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1551661Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1552542Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1552898Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1553761Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1554196Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1555056Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1555483Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1556332Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1556743Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1557625Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1558071Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1559406Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1559738Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1560410Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1561302Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1561634Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1562267Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1562784Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1563188Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1563665Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1564581Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1565030Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1565916Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1566275Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1567145Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1567581Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1568440Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1568868Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1569719Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1570152Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1571013Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1571452Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1572802Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640.
2025-12-04T09:28:45.1573136Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1573721Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1574612Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1574940Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1575576Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1576094Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1576578Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1577309Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1578305Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1578812Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1579808Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1580207Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1581182Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1581666Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1582632Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1583120Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1584101Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1584561Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1585520Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1586023Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1587556Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1587935Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1588694Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1589754Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1590136Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1590912Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1591433Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1591890Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1592385Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1593333Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1593808Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1594749Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1595126Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1596038Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1596491Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1597484Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1597948Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1598798Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1599201Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1600050Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1600496Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1601851Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1602187Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1602769Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1603655Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1604017Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1604650Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1605166Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1605256Z dist init r=0, world=4
2025-12-04T09:28:45.1605343Z dist init r=1, world=4
2025-12-04T09:28:45.1605441Z dist init r=3, world=4
2025-12-04T09:28:45.1605526Z dist init r=2, world=4
2025-12-04T09:28:45.1606553Z [rank0]:[W1204 09:28:11.492306598 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.1606649Z FAILED [8.9305s] [100%]
2025-12-04T09:28:45.1606655Z 
2025-12-04T09:28:45.1606785Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1607028Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________
2025-12-04T09:28:45.1607136Z Traceback (most recent call last):
2025-12-04T09:28:45.1607622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1607733Z     self._join_processes(fn)
2025-12-04T09:28:45.1608248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1608384Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1608917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1609020Z     raise RuntimeError(error)
2025-12-04T09:28:45.1609241Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1609345Z Traceback (most recent call last):
2025-12-04T09:28:45.1609865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1609976Z     getattr(self, test_name)()
2025-12-04T09:28:45.1610448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1610539Z     fn()
2025-12-04T09:28:45.1610985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1611079Z     method(*args, **kwargs)
2025-12-04T09:28:45.1611534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1611628Z     method(*args, **kwargs)
2025-12-04T09:28:45.1612099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1612195Z     with policy():
2025-12-04T09:28:45.1612644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1612748Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1613675Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1613681Z 
2025-12-04T09:28:45.1613876Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1614371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1614403Z 
2025-12-04T09:28:45.1614640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1614647Z 
2025-12-04T09:28:45.1614804Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1614937Z Traceback (most recent call last):
2025-12-04T09:28:45.1615421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1615526Z     getattr(self, test_name)()
2025-12-04T09:28:45.1616000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1616089Z     fn()
2025-12-04T09:28:45.1616610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1616877Z     method(*args, **kwargs)
2025-12-04T09:28:45.1617396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1617498Z     method(*args, **kwargs)
2025-12-04T09:28:45.1618006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1618110Z     with policy():
2025-12-04T09:28:45.1618619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1618725Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1619780Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1619786Z 
2025-12-04T09:28:45.1619999Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1620564Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1620569Z 
2025-12-04T09:28:45.1621023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1621096Z 
2025-12-04T09:28:45.1621276Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.1621395Z Traceback (most recent call last):
2025-12-04T09:28:45.1621944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1622063Z     getattr(self, test_name)()
2025-12-04T09:28:45.1622598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1622687Z     fn()
2025-12-04T09:28:45.1623198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1623304Z     method(*args, **kwargs)
2025-12-04T09:28:45.1623859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1623964Z     method(*args, **kwargs)
2025-12-04T09:28:45.1624467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1624574Z     with policy():
2025-12-04T09:28:45.1625083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1625192Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1626247Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640.
2025-12-04T09:28:45.1626289Z 
2025-12-04T09:28:45.1626508Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1627069Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1627074Z 
2025-12-04T09:28:45.1627336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1627380Z 
2025-12-04T09:28:45.1627385Z 
2025-12-04T09:28:45.1627607Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1627869Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1628766Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml -
2025-12-04T09:28:45.1628945Z =========================== short test summary info ============================
2025-12-04T09:28:45.1629678Z FAILED [8.9305s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1629802Z Traceback (most recent call last):
2025-12-04T09:28:45.1630359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1630472Z     getattr(self, test_name)()
2025-12-04T09:28:45.1631017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1631106Z     fn()
2025-12-04T09:28:45.1631621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1631723Z     method(*args, **kwargs)
2025-12-04T09:28:45.1632226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1632341Z     method(*args, **kwargs)
2025-12-04T09:28:45.1632944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1633031Z     with policy():
2025-12-04T09:28:45.1633519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1633619Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1634561Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1634566Z 
2025-12-04T09:28:45.1634756Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1635247Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1635254Z 
2025-12-04T09:28:45.1635492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1635497Z 
2025-12-04T09:28:45.1635671Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1635781Z Traceback (most recent call last):
2025-12-04T09:28:45.1636263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1636363Z     getattr(self, test_name)()
2025-12-04T09:28:45.1636845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1636926Z     fn()
2025-12-04T09:28:45.1637379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1637469Z     method(*args, **kwargs)
2025-12-04T09:28:45.1637914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1638042Z     method(*args, **kwargs)
2025-12-04T09:28:45.1638487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1638573Z     with policy():
2025-12-04T09:28:45.1639031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1639155Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1640083Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1640088Z 
2025-12-04T09:28:45.1640279Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1640772Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1640785Z 
2025-12-04T09:28:45.1641019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1641026Z 
2025-12-04T09:28:45.1641170Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.1641287Z Traceback (most recent call last):
2025-12-04T09:28:45.1641771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1641871Z     getattr(self, test_name)()
2025-12-04T09:28:45.1642351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1642429Z     fn()
2025-12-04T09:28:45.1642885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1642975Z     method(*args, **kwargs)
2025-12-04T09:28:45.1643424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1643521Z     method(*args, **kwargs)
2025-12-04T09:28:45.1643990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1644079Z     with policy():
2025-12-04T09:28:45.1644536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1644634Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1645568Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640.
2025-12-04T09:28:45.1645572Z 
2025-12-04T09:28:45.1645761Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1646254Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1646268Z 
2025-12-04T09:28:45.1646528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1646690Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1646851Z ======================= 1 failed, 3 deselected in 8.95s ========================
2025-12-04T09:28:45.1646935Z Got exit code 1
2025-12-04T09:28:45.1647030Z Retrying single test...
2025-12-04T09:28:45.1647672Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml
2025-12-04T09:28:45.1647819Z ============================= test session starts ==============================
2025-12-04T09:28:45.1648138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1648690Z cachedir: .pytest_cache
2025-12-04T09:28:45.1649151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1649267Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1649363Z configfile: pytest.ini
2025-12-04T09:28:45.1649879Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1650074Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.1650642Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda
2025-12-04T09:28:45.1650751Z Running 1 items in this shard
2025-12-04T09:28:45.1650756Z 
2025-12-04T09:28:45.1651583Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:17.893000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40865
2025-12-04T09:28:45.1652033Z I1204 09:28:17.894000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40866
2025-12-04T09:28:45.1652482Z I1204 09:28:17.895000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40867
2025-12-04T09:28:45.1652919Z I1204 09:28:17.896000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40868
2025-12-04T09:28:45.1653332Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1653807Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1654701Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1655161Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1656062Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1656511Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1657625Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1658124Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1659128Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1659623Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1660594Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1661041Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1662017Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1662540Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1664054Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1664447Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1665119Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1666129Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1666496Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1667224Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1667772Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1668236Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1668870Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1669878Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1670362Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1671245Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1671605Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1672455Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1672902Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1673785Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1674219Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1675072Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1675469Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1676363Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1676802Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1678167Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1678487Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1679071Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1679973Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1680296Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1680937Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1681420Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1681828Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1682302Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1683211Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1683672Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1684551Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1684906Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1685758Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1686228Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1687090Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1687521Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1688379Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1688801Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1689659Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1690116Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1691448Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640.
2025-12-04T09:28:45.1691766Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1692350Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1693250Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1693570Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1694210Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1694691Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1695100Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1695592Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1696552Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1697224Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1698211Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1698616Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1699619Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1700112Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1701083Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1701573Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1702550Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1703041Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1704048Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1704540Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1706043Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1706409Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1707068Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1708079Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1708440Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1709372Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1709901Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1709998Z dist init r=1, world=4
2025-12-04T09:28:45.1710107Z dist init r=0, world=4
2025-12-04T09:28:45.1710227Z dist init r=2, world=4
2025-12-04T09:28:45.1710332Z dist init r=3, world=4
2025-12-04T09:28:45.1711451Z [rank0]:[W1204 09:28:25.634887694 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.1711545Z FAILED [8.8750s] [100%]
2025-12-04T09:28:45.1711551Z 
2025-12-04T09:28:45.1711701Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1711948Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________
2025-12-04T09:28:45.1712074Z Traceback (most recent call last):
2025-12-04T09:28:45.1712605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1712758Z     self._join_processes(fn)
2025-12-04T09:28:45.1713338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1713475Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1714062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1714182Z     raise RuntimeError(error)
2025-12-04T09:28:45.1714407Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1714538Z Traceback (most recent call last):
2025-12-04T09:28:45.1715058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1715193Z     getattr(self, test_name)()
2025-12-04T09:28:45.1715725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1715811Z     fn()
2025-12-04T09:28:45.1716311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1716451Z     method(*args, **kwargs)
2025-12-04T09:28:45.1716938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1717050Z     method(*args, **kwargs)
2025-12-04T09:28:45.1717533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1717628Z     with policy():
2025-12-04T09:28:45.1718133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1718237Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1719254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1719273Z 
2025-12-04T09:28:45.1719479Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1720015Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1720020Z 
2025-12-04T09:28:45.1720288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1720293Z 
2025-12-04T09:28:45.1720447Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1720579Z Traceback (most recent call last):
2025-12-04T09:28:45.1721420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1721535Z     getattr(self, test_name)()
2025-12-04T09:28:45.1722173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1722327Z     fn()
2025-12-04T09:28:45.1722835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1722947Z     method(*args, **kwargs)
2025-12-04T09:28:45.1723447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1723560Z     method(*args, **kwargs)
2025-12-04T09:28:45.1724064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1724159Z     with policy():
2025-12-04T09:28:45.1724679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1724792Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1725894Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1725903Z 
2025-12-04T09:28:45.1726122Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1726675Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1726681Z 
2025-12-04T09:28:45.1726951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1726957Z 
2025-12-04T09:28:45.1727123Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.1727251Z Traceback (most recent call last):
2025-12-04T09:28:45.1727835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1727953Z     getattr(self, test_name)()
2025-12-04T09:28:45.1728500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1728635Z     fn()
2025-12-04T09:28:45.1729142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1729254Z     method(*args, **kwargs)
2025-12-04T09:28:45.1729758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1729860Z     method(*args, **kwargs)
2025-12-04T09:28:45.1730372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1730471Z     with policy():
2025-12-04T09:28:45.1730988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1731097Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1732145Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1732162Z 
2025-12-04T09:28:45.1732375Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1733030Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1733035Z 
2025-12-04T09:28:45.1733298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1733303Z 
2025-12-04T09:28:45.1733307Z 
2025-12-04T09:28:45.1733520Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1733881Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1734751Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml -
2025-12-04T09:28:45.1734910Z =========================== short test summary info ============================
2025-12-04T09:28:45.1735600Z FAILED [8.8750s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1735713Z Traceback (most recent call last):
2025-12-04T09:28:45.1736230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1736396Z     getattr(self, test_name)()
2025-12-04T09:28:45.1737086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1737181Z     fn()
2025-12-04T09:28:45.1737721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1737829Z     method(*args, **kwargs)
2025-12-04T09:28:45.1738343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1738448Z     method(*args, **kwargs)
2025-12-04T09:28:45.1738958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1739054Z     with policy():
2025-12-04T09:28:45.1739563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1739677Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1740750Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1740757Z 
2025-12-04T09:28:45.1740979Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1741562Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1741567Z 
2025-12-04T09:28:45.1741831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1741836Z 
2025-12-04T09:28:45.1742004Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1742127Z Traceback (most recent call last):
2025-12-04T09:28:45.1742680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1742791Z     getattr(self, test_name)()
2025-12-04T09:28:45.1743327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1743423Z     fn()
2025-12-04T09:28:45.1743928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1744035Z     method(*args, **kwargs)
2025-12-04T09:28:45.1744546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1744649Z     method(*args, **kwargs)
2025-12-04T09:28:45.1745163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1745257Z     with policy():
2025-12-04T09:28:45.1745766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1745882Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1746955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1746963Z 
2025-12-04T09:28:45.1747183Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1747734Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1747739Z 
2025-12-04T09:28:45.1748002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1748007Z 
2025-12-04T09:28:45.1748176Z Process 2 exited with error code 10 and exception:
2025-12-04T09:28:45.1748292Z Traceback (most recent call last):
2025-12-04T09:28:45.1748945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1749042Z     getattr(self, test_name)()
2025-12-04T09:28:45.1749555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1749643Z     fn()
2025-12-04T09:28:45.1750092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1750183Z     method(*args, **kwargs)
2025-12-04T09:28:45.1750640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1750729Z     method(*args, **kwargs)
2025-12-04T09:28:45.1751179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1751263Z     with policy():
2025-12-04T09:28:45.1751707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1751834Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1752758Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640.
2025-12-04T09:28:45.1752789Z 
2025-12-04T09:28:45.1752989Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1753474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1753479Z 
2025-12-04T09:28:45.1753709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1753869Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1754023Z ======================= 1 failed, 3 deselected in 8.90s ========================
2025-12-04T09:28:45.1754118Z Got exit code 1
2025-12-04T09:28:45.1754213Z Retrying single test...
2025-12-04T09:28:45.1754855Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml
2025-12-04T09:28:45.1755011Z ============================= test session starts ==============================
2025-12-04T09:28:45.1755316Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1755408Z cachedir: .pytest_cache
2025-12-04T09:28:45.1755871Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1755976Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1756076Z configfile: pytest.ini
2025-12-04T09:28:45.1756544Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1756725Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T09:28:45.1757297Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda
2025-12-04T09:28:45.1757478Z Running 1 items in this shard
2025-12-04T09:28:45.1757486Z 
2025-12-04T09:28:45.1758317Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:31.074000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41202
2025-12-04T09:28:45.1758754Z I1204 09:28:31.075000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41203
2025-12-04T09:28:45.1759190Z I1204 09:28:31.076000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41204
2025-12-04T09:28:45.1759632Z I1204 09:28:31.077000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41205
2025-12-04T09:28:45.1760059Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1760540Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1761433Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1761884Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1762771Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1763147Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1764005Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1764460Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1765313Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1765740Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1766588Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1766991Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1767841Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1768278Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1769599Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640.
2025-12-04T09:28:45.1769928Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1770539Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1771434Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1771755Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1772383Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1772895Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:28:45.1773298Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1773773Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1774659Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1775106Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1776005Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1776444Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1777593Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1778075Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1779034Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1779517Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1780476Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1780928Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1781890Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1782383Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1783911Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1784279Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1784930Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1785933Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1786298Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1787033Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1787585Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:28:45.1788033Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1788567Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1789590Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1790062Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1790944Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1791318Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1792174Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1792602Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1793460Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1793895Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1794744Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1795142Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1795991Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1796433Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1797794Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:28:45.1798122Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1798703Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1799585Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1799914Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1800577Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1801069Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:28:45.1801467Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:28:45.1801935Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:28:45.1802823Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1803303Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:28:45.1804188Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1804564Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:28:45.1805411Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1805838Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1806687Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1807122Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:28:45.1807969Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1808364Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:28:45.1809214Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1809656Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:28:45.1811014Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1811339Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1811925Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1812835Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1813156Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:28:45.1813792Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1814271Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:28:45.1814362Z dist init r=3, world=4
2025-12-04T09:28:45.1814448Z dist init r=1, world=4
2025-12-04T09:28:45.1814534Z dist init r=0, world=4
2025-12-04T09:28:45.1814626Z dist init r=2, world=4
2025-12-04T09:28:45.1815653Z [rank0]:[W1204 09:28:38.818750524 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:28:45.1815775Z FAILED [8.9108s] [100%]
2025-12-04T09:28:45.1815780Z 
2025-12-04T09:28:45.1815912Z =================================== FAILURES ===================================
2025-12-04T09:28:45.1816166Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________
2025-12-04T09:28:45.1816345Z Traceback (most recent call last):
2025-12-04T09:28:45.1817024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:28:45.1817136Z     self._join_processes(fn)
2025-12-04T09:28:45.1817726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:28:45.1817865Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:28:45.1818478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:28:45.1818593Z     raise RuntimeError(error)
2025-12-04T09:28:45.1818826Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1818954Z Traceback (most recent call last):
2025-12-04T09:28:45.1819494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1819607Z     getattr(self, test_name)()
2025-12-04T09:28:45.1820137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1820224Z     fn()
2025-12-04T09:28:45.1820907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1821019Z     method(*args, **kwargs)
2025-12-04T09:28:45.1821527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1821631Z     method(*args, **kwargs)
2025-12-04T09:28:45.1822199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1822304Z     with policy():
2025-12-04T09:28:45.1822814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1822923Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1823979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1823986Z 
2025-12-04T09:28:45.1824200Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1824764Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1824770Z 
2025-12-04T09:28:45.1825080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1825089Z 
2025-12-04T09:28:45.1825251Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1825381Z Traceback (most recent call last):
2025-12-04T09:28:45.1825922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1826037Z     getattr(self, test_name)()
2025-12-04T09:28:45.1826570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1826654Z     fn()
2025-12-04T09:28:45.1827164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1827307Z     method(*args, **kwargs)
2025-12-04T09:28:45.1827812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1827918Z     method(*args, **kwargs)
2025-12-04T09:28:45.1828421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1828560Z     with policy():
2025-12-04T09:28:45.1829065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1829169Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1830216Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1830224Z 
2025-12-04T09:28:45.1830434Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1830990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1830996Z 
2025-12-04T09:28:45.1831260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1831267Z 
2025-12-04T09:28:45.1831425Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.1831547Z Traceback (most recent call last):
2025-12-04T09:28:45.1832090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1832202Z     getattr(self, test_name)()
2025-12-04T09:28:45.1832735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1832927Z     fn()
2025-12-04T09:28:45.1833422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1833519Z     method(*args, **kwargs)
2025-12-04T09:28:45.1834032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1834137Z     method(*args, **kwargs)
2025-12-04T09:28:45.1834622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1834719Z     with policy():
2025-12-04T09:28:45.1835208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1835310Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1836329Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:28:45.1836336Z 
2025-12-04T09:28:45.1836540Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1837107Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1837115Z 
2025-12-04T09:28:45.1837371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1837376Z 
2025-12-04T09:28:45.1837381Z 
2025-12-04T09:28:45.1837590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:28:45.1837848Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:28:45.1838714Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml -
2025-12-04T09:28:45.1838881Z =========================== short test summary info ============================
2025-12-04T09:28:45.1839617Z FAILED [8.9108s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:28:45.1839731Z Traceback (most recent call last):
2025-12-04T09:28:45.1840266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1840413Z     getattr(self, test_name)()
2025-12-04T09:28:45.1840933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1841016Z     fn()
2025-12-04T09:28:45.1841501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1841603Z     method(*args, **kwargs)
2025-12-04T09:28:45.1842091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1842190Z     method(*args, **kwargs)
2025-12-04T09:28:45.1842688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1842784Z     with policy():
2025-12-04T09:28:45.1843375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1843475Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1844462Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544.
2025-12-04T09:28:45.1844475Z 
2025-12-04T09:28:45.1844675Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1845192Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1845199Z 
2025-12-04T09:28:45.1845450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1845455Z 
2025-12-04T09:28:45.1845631Z Process 1 exited with error code 10 and exception:
2025-12-04T09:28:45.1845745Z Traceback (most recent call last):
2025-12-04T09:28:45.1846260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1846361Z     getattr(self, test_name)()
2025-12-04T09:28:45.1846870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1846950Z     fn()
2025-12-04T09:28:45.1847423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1847523Z     method(*args, **kwargs)
2025-12-04T09:28:45.1847994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1848093Z     method(*args, **kwargs)
2025-12-04T09:28:45.1848594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1848686Z     with policy():
2025-12-04T09:28:45.1849171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1849272Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1850248Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640.
2025-12-04T09:28:45.1850260Z 
2025-12-04T09:28:45.1850462Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1851007Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1851013Z 
2025-12-04T09:28:45.1851266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1851273Z 
2025-12-04T09:28:45.1851449Z Process 3 exited with error code 10 and exception:
2025-12-04T09:28:45.1851558Z Traceback (most recent call last):
2025-12-04T09:28:45.1852076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:28:45.1852176Z     getattr(self, test_name)()
2025-12-04T09:28:45.1852683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:28:45.1852762Z     fn()
2025-12-04T09:28:45.1853234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1853340Z     method(*args, **kwargs)
2025-12-04T09:28:45.1853810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:28:45.1853915Z     method(*args, **kwargs)
2025-12-04T09:28:45.1854385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:28:45.1854477Z     with policy():
2025-12-04T09:28:45.1854962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:28:45.1855061Z     raise RuntimeError(msg)
2025-12-04T09:28:45.1856037Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640.
2025-12-04T09:28:45.1856052Z 
2025-12-04T09:28:45.1856321Z To execute this test, run the following from the base repo dir:
2025-12-04T09:28:45.1857022Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda
2025-12-04T09:28:45.1857028Z 
2025-12-04T09:28:45.1857333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:28:45.1857514Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:28:45.1857690Z ======================= 1 failed, 3 deselected in 8.93s ========================
2025-12-04T09:28:45.1857797Z Got exit code 1
2025-12-04T09:28:45.1858271Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda
2025-12-04T09:28:45.1858682Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:28:45.1859396Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml
2025-12-04T09:28:45.1859556Z ============================= test session starts ==============================
2025-12-04T09:28:45.1859936Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:28:45.1860046Z cachedir: .pytest_cache
2025-12-04T09:28:45.1860564Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:28:45.1860683Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:28:45.1860788Z configfile: pytest.ini
2025-12-04T09:28:45.1861333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:28:45.1861537Z collecting ... collected 4 items / 4 deselected / 0 selected
2025-12-04T09:28:45.1861675Z stepcurrent: skipping 4 already run items.
2025-12-04T09:28:45.1861819Z Running 0 items in this shard
2025-12-04T09:28:45.1861825Z 
2025-12-04T09:28:45.1862722Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml -
2025-12-04T09:28:45.1862891Z ============================ 4 deselected in 0.01s =============================
2025-12-04T09:28:45.1864946Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda']
2025-12-04T09:28:45.1864953Z 
2025-12-04T09:28:45.1865643Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm 1/1 (test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log)
2025-12-04T09:28:45.1865651Z 
2025-12-04T09:28:45.1866073Z Finished distributed/fsdp/test_fsdp_clip_grad_norm 1/1 ... [2025-12-04 09:28:44.964469][2156.572384067], took 3.40min
2025-12-04T09:28:45.1867025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml
2025-12-04T09:28:45.1867987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml
2025-12-04T09:28:45.1869113Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml
2025-12-04T09:28:45.1869946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml
2025-12-04T09:28:45.1870803Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml
2025-12-04T09:28:45.1871641Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml
2025-12-04T09:28:45.2060962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml
2025-12-04T09:28:45.2366091Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml
2025-12-04T09:28:45.2644813Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml
2025-12-04T09:28:45.2932739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml
2025-12-04T09:28:45.3243529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml
2025-12-04T09:28:45.3554153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml
2025-12-04T09:28:45.3843133Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml
2025-12-04T09:28:45.7098722Z Uploading logs for 57116084904 to S3
2025-12-04T09:28:45.7669412Z Uploading artifacts took 0.36 seconds
2025-12-04T09:28:45.7669911Z distributed/fsdp/test_fsdp_clip_grad_norm 1/1 failed!
2025-12-04T09:28:45.7675845Z Running distributed/fsdp/test_fsdp_core 2/2 ... [2025-12-04 09:28:45.766985][2157.374901957]
2025-12-04T09:28:45.7676435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:28:45.7677689Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:28:45.767316]
2025-12-04T09:59:12.9994212Z 
2025-12-04T09:59:12.9995161Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 2/2 (test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log)
2025-12-04T09:59:12.9996538Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml
2025-12-04T09:59:12.9997499Z ============================= test session starts ==============================
2025-12-04T09:59:12.9998170Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:12.9998772Z cachedir: .pytest_cache
2025-12-04T09:59:12.9999479Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.0000259Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.0000604Z configfile: pytest.ini
2025-12-04T09:59:13.0001331Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.0002126Z collecting ... collected 60 items
2025-12-04T09:59:13.0002555Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:59:13.0019650Z Running 27 items in this shard: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda
2025-12-04T09:59:13.0036710Z 
2025-12-04T09:59:13.0037775Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:28:49.174000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41596
2025-12-04T09:59:13.0039471Z I1204 09:28:49.175000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41597
2025-12-04T09:59:13.0040619Z I1204 09:28:49.176000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41598
2025-12-04T09:59:13.0041750Z I1204 09:28:49.176000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41599
2025-12-04T09:59:13.0043740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0045275Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0047260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0049289Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0050843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0052414Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0053916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0055428Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0057496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0059580Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0061614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0063689Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0065235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0066752Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0068728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0070760Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0075608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0080654Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0085725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0090739Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0095804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0101171Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0106227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0111250Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0112264Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0113440Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0115148Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0116828Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0118491Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0120048Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0122255Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0123890Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0125511Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0127124Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0128739Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0130346Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0131956Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0133686Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0139159Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 611254272 and is now 634322944.
2025-12-04T09:59:13.0141362Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0142558Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0144498Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0146134Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0147374Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0148805Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.0150045Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0151184Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0152884Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0154553Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0156214Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0157787Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0159305Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0160917Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0162524Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0164167Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0165766Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0167459Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0168975Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0170535Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0172784Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0174862Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0176006Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0178192Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0179823Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0181064Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0182504Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.0183661Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0184801Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0186492Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0188149Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0189930Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0191426Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0192896Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0194451Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0196028Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0197586Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0199178Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0200683Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0202195Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0203743Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0205965Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0208059Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0209205Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0211069Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0212636Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0213862Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0215228Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.0216429Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0217753Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0219435Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0221385Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0223064Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0224607Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0226121Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0227770Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0229374Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0231010Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0232609Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0234189Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0235665Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0237192Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0239362Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848.
2025-12-04T09:59:13.0241459Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0242556Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0244419Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0246245Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0247437Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0248811Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.0249637Z dist init r=0, world=4
2025-12-04T09:59:13.0249933Z dist init r=1, world=4
2025-12-04T09:59:13.0250216Z dist init r=2, world=4
2025-12-04T09:59:13.0250490Z dist init r=3, world=4
2025-12-04T09:59:13.0251844Z [rank0]:[W1204 09:28:56.178164705 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.0253206Z FAILED [9.2706s] [  3%]
2025-12-04T09:59:13.0253394Z 
2025-12-04T09:59:13.0253545Z =================================== FAILURES ===================================
2025-12-04T09:59:13.0254126Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____
2025-12-04T09:59:13.0254666Z Traceback (most recent call last):
2025-12-04T09:59:13.0255442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.0256227Z     self._join_processes(fn)
2025-12-04T09:59:13.0257104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.0258192Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.0259089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.0259989Z     raise RuntimeError(error)
2025-12-04T09:59:13.0260445Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.0260928Z Traceback (most recent call last):
2025-12-04T09:59:13.0261716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0262517Z     getattr(self, test_name)()
2025-12-04T09:59:13.0263264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0264042Z     fn()
2025-12-04T09:59:13.0264702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0265466Z     method(*args, **kwargs)
2025-12-04T09:59:13.0266179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0266945Z     method(*args, **kwargs)
2025-12-04T09:59:13.0267662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0268547Z     with policy():
2025-12-04T09:59:13.0269219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0270066Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0271399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0272648Z 
2025-12-04T09:59:13.0272855Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0273863Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0274636Z 
2025-12-04T09:59:13.0274891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0275270Z 
2025-12-04T09:59:13.0275275Z 
2025-12-04T09:59:13.0275505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.0276090Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.0277411Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml -
2025-12-04T09:59:13.0278504Z =========================== short test summary info ============================
2025-12-04T09:59:13.0279661Z FAILED [9.2706s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.0280721Z Traceback (most recent call last):
2025-12-04T09:59:13.0281483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0282264Z     getattr(self, test_name)()
2025-12-04T09:59:13.0282999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0283738Z     fn()
2025-12-04T09:59:13.0284370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0285112Z     method(*args, **kwargs)
2025-12-04T09:59:13.0285842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0286566Z     method(*args, **kwargs)
2025-12-04T09:59:13.0287375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0288317Z     with policy():
2025-12-04T09:59:13.0289070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0289798Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0291125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0292371Z 
2025-12-04T09:59:13.0292590Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0293554Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0294325Z 
2025-12-04T09:59:13.0294581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0295144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.0295598Z ============================== 1 failed in 9.49s ===============================
2025-12-04T09:59:13.0295963Z Got exit code 1
2025-12-04T09:59:13.0296221Z Retrying single test...
2025-12-04T09:59:13.0297264Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml
2025-12-04T09:59:13.0298196Z ============================= test session starts ==============================
2025-12-04T09:59:13.0298845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.0299449Z cachedir: .pytest_cache
2025-12-04T09:59:13.0300161Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.0300931Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.0301330Z configfile: pytest.ini
2025-12-04T09:59:13.0302062Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.0302956Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.0304052Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0305047Z Running 1 items in this shard
2025-12-04T09:59:13.0305261Z 
2025-12-04T09:59:13.0306308Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:29:03.213000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41933
2025-12-04T09:59:13.0308012Z I1204 09:29:03.214000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41934
2025-12-04T09:59:13.0309231Z I1204 09:29:03.215000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41935
2025-12-04T09:59:13.0310302Z I1204 09:29:03.216000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41936
2025-12-04T09:59:13.0312080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0313500Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0315407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0317323Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0318773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0320181Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0322044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0323573Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0325541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0327564Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0329591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0331612Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0333222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0334697Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0336538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0338721Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0343583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0348633Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0353244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0357716Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0362221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0366667Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0371166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0375607Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0376550Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0377863Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0379556Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0381260Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0382896Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0384473Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0385986Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0387589Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0389384Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0390798Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0392458Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0393936Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0395411Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0396938Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0399318Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848.
2025-12-04T09:59:13.0401419Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0402568Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0404474Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0406070Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0407257Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0408629Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.0409753Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0410858Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0412517Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0414270Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0415835Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0417583Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0419099Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0420694Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0422541Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0424153Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0425757Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0427328Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0428954Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0430571Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0432978Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 611254272 and is now 634322944.
2025-12-04T09:59:13.0435134Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0436286Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0438090Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0439621Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0440784Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0442106Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.0443230Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0444368Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0445869Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0447388Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0448859Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0450216Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0451570Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0452996Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0454422Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0455841Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0457580Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0459195Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0460767Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0462376Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0464669Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0466839Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0468018Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0470054Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0471512Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0472608Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0473876Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.0474895Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0475941Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0477436Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0478896Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0480367Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0481737Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0483083Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0484505Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0485917Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0487345Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0488792Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0490176Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0491567Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0492981Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0495039Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0497239Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0498424Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0500353Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0501965Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0503246Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0504663Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.0505490Z dist init r=2, world=4
2025-12-04T09:59:13.0505766Z dist init r=0, world=4
2025-12-04T09:59:13.0506053Z dist init r=3, world=4
2025-12-04T09:59:13.0506332Z dist init r=1, world=4
2025-12-04T09:59:13.0507676Z [rank0]:[W1204 09:29:10.199158514 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.0509151Z FAILED [9.5626s] [100%]
2025-12-04T09:59:13.0509327Z 
2025-12-04T09:59:13.0509467Z =================================== FAILURES ===================================
2025-12-04T09:59:13.0510003Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____
2025-12-04T09:59:13.0510503Z Traceback (most recent call last):
2025-12-04T09:59:13.0511212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.0511930Z     self._join_processes(fn)
2025-12-04T09:59:13.0512647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.0513414Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.0514249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.0515130Z     raise RuntimeError(error)
2025-12-04T09:59:13.0515694Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.0516400Z Traceback (most recent call last):
2025-12-04T09:59:13.0517166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0518032Z     getattr(self, test_name)()
2025-12-04T09:59:13.0518865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0568362Z     fn()
2025-12-04T09:59:13.0569017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0569718Z     method(*args, **kwargs)
2025-12-04T09:59:13.0570358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0571047Z     method(*args, **kwargs)
2025-12-04T09:59:13.0571691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0572378Z     with policy():
2025-12-04T09:59:13.0573101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0573805Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0575071Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848.
2025-12-04T09:59:13.0576266Z 
2025-12-04T09:59:13.0576584Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0577766Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0578587Z 
2025-12-04T09:59:13.0578935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0579339Z 
2025-12-04T09:59:13.0579522Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.0579959Z Traceback (most recent call last):
2025-12-04T09:59:13.0580751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0581617Z     getattr(self, test_name)()
2025-12-04T09:59:13.0582386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0583151Z     fn()
2025-12-04T09:59:13.0583807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0584577Z     method(*args, **kwargs)
2025-12-04T09:59:13.0585297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0586047Z     method(*args, **kwargs)
2025-12-04T09:59:13.0586765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0587526Z     with policy():
2025-12-04T09:59:13.0588201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0589079Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0590331Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0591517Z 
2025-12-04T09:59:13.0591725Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0592627Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0593355Z 
2025-12-04T09:59:13.0593593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0593964Z 
2025-12-04T09:59:13.0593968Z 
2025-12-04T09:59:13.0594219Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.0594790Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.0595870Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml -
2025-12-04T09:59:13.0596850Z =========================== short test summary info ============================
2025-12-04T09:59:13.0597880Z FAILED [9.5626s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.0598852Z Traceback (most recent call last):
2025-12-04T09:59:13.0599568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0600300Z     getattr(self, test_name)()
2025-12-04T09:59:13.0600987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0601688Z     fn()
2025-12-04T09:59:13.0602262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0602949Z     method(*args, **kwargs)
2025-12-04T09:59:13.0603594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0604277Z     method(*args, **kwargs)
2025-12-04T09:59:13.0604900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0605623Z     with policy():
2025-12-04T09:59:13.0606240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0606917Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0608187Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848.
2025-12-04T09:59:13.0609409Z 
2025-12-04T09:59:13.0609606Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0610518Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0611229Z 
2025-12-04T09:59:13.0611486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0611842Z 
2025-12-04T09:59:13.0611991Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.0612374Z Traceback (most recent call last):
2025-12-04T09:59:13.0613091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0613801Z     getattr(self, test_name)()
2025-12-04T09:59:13.0614475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0615170Z     fn()
2025-12-04T09:59:13.0615757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0616514Z     method(*args, **kwargs)
2025-12-04T09:59:13.0617389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0618154Z     method(*args, **kwargs)
2025-12-04T09:59:13.0618861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0619618Z     with policy():
2025-12-04T09:59:13.0620358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0621411Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0622810Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0624151Z 
2025-12-04T09:59:13.0624369Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0625398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0626207Z 
2025-12-04T09:59:13.0626487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0627148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.0627660Z ======================= 1 failed, 26 deselected in 9.78s =======================
2025-12-04T09:59:13.0628088Z Got exit code 1
2025-12-04T09:59:13.0628362Z Retrying single test...
2025-12-04T09:59:13.0629171Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml
2025-12-04T09:59:13.0630106Z ============================= test session starts ==============================
2025-12-04T09:59:13.0630776Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.0631366Z cachedir: .pytest_cache
2025-12-04T09:59:13.0632074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.0633012Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.0633336Z configfile: pytest.ini
2025-12-04T09:59:13.0633978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.0634825Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.0635815Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0636706Z Running 1 items in this shard
2025-12-04T09:59:13.0636896Z 
2025-12-04T09:59:13.0637817Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:29:17.143000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42270
2025-12-04T09:59:13.0639309Z I1204 09:29:17.144000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42271
2025-12-04T09:59:13.0640332Z I1204 09:29:17.145000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42272
2025-12-04T09:59:13.0641346Z I1204 09:29:17.146000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42273
2025-12-04T09:59:13.0643020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0644358Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0646113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0647907Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0649327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0650661Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0652421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0654211Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0655617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0657237Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0658705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0660184Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0662137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0664191Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0666225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0668227Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0672774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0677212Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0681715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0686145Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0690595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0695020Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0700086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.0705123Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.0706101Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0707221Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0709014Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0710599Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0712086Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0713429Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0714755Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0716162Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0717567Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0718992Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0720394Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0722179Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0723728Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0725390Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0727679Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 709820416 and is now 743374848.
2025-12-04T09:59:13.0729867Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0731021Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0732931Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0734537Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0735625Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0737147Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.0738291Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0739416Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0741089Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0742782Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0744406Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0745934Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0747429Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0749119Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0750568Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0751972Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0753377Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0754738Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0756107Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0757547Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0759598Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 602865664 and is now 634322944.
2025-12-04T09:59:13.0761495Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0762525Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0764219Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0765647Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0766712Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0767946Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.0768956Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0769947Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0771450Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0772912Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0774363Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0775703Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0777357Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0779003Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0780591Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0782183Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0783766Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0785350Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0786905Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0788529Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0790715Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0792611Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0793645Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0795332Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0796756Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0797832Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0799069Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.0800068Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0801092Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0802580Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0804028Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0805486Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0806824Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0808183Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0809594Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0810991Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0812396Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0813864Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0815244Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0816882Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0818476Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0820933Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0823149Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0824327Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0826257Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0827876Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0829089Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0830502Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.0831297Z dist init r=2, world=4
2025-12-04T09:59:13.0831637Z dist init r=1, world=4
2025-12-04T09:59:13.0831917Z dist init r=0, world=4
2025-12-04T09:59:13.0832190Z dist init r=3, world=4
2025-12-04T09:59:13.0833599Z [rank0]:[W1204 09:29:24.260352374 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.0835082Z FAILED [9.3996s] [100%]
2025-12-04T09:59:13.0835261Z 
2025-12-04T09:59:13.0835407Z =================================== FAILURES ===================================
2025-12-04T09:59:13.0835966Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____
2025-12-04T09:59:13.0836508Z Traceback (most recent call last):
2025-12-04T09:59:13.0837304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.0838073Z     self._join_processes(fn)
2025-12-04T09:59:13.0838847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.0839686Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.0840535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.0841478Z     raise RuntimeError(error)
2025-12-04T09:59:13.0841897Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.0842346Z Traceback (most recent call last):
2025-12-04T09:59:13.0843079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0843865Z     getattr(self, test_name)()
2025-12-04T09:59:13.0844561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0845283Z     fn()
2025-12-04T09:59:13.0845930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0846647Z     method(*args, **kwargs)
2025-12-04T09:59:13.0847304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0848015Z     method(*args, **kwargs)
2025-12-04T09:59:13.0848680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0849372Z     with policy():
2025-12-04T09:59:13.0850009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0850734Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0852049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0853293Z 
2025-12-04T09:59:13.0853502Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0854451Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0855217Z 
2025-12-04T09:59:13.0855464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0855841Z 
2025-12-04T09:59:13.0856002Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.0856469Z Traceback (most recent call last):
2025-12-04T09:59:13.0857421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0858215Z     getattr(self, test_name)()
2025-12-04T09:59:13.0859003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0859764Z     fn()
2025-12-04T09:59:13.0860406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0861161Z     method(*args, **kwargs)
2025-12-04T09:59:13.0861863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0862605Z     method(*args, **kwargs)
2025-12-04T09:59:13.0863311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0864064Z     with policy():
2025-12-04T09:59:13.0864758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0865518Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0866922Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0868243Z 
2025-12-04T09:59:13.0868462Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0869626Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0870342Z 
2025-12-04T09:59:13.0870576Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0870974Z 
2025-12-04T09:59:13.0870978Z 
2025-12-04T09:59:13.0871176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.0871728Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.0872778Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml -
2025-12-04T09:59:13.0873779Z =========================== short test summary info ============================
2025-12-04T09:59:13.0874784Z FAILED [9.3996s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.0875732Z Traceback (most recent call last):
2025-12-04T09:59:13.0876422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0877136Z     getattr(self, test_name)()
2025-12-04T09:59:13.0877803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0878484Z     fn()
2025-12-04T09:59:13.0879050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0879729Z     method(*args, **kwargs)
2025-12-04T09:59:13.0880353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0881021Z     method(*args, **kwargs)
2025-12-04T09:59:13.0881639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0882306Z     with policy():
2025-12-04T09:59:13.0882908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0883576Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0884855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.0886034Z 
2025-12-04T09:59:13.0886224Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0887116Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0887827Z 
2025-12-04T09:59:13.0888065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0888417Z 
2025-12-04T09:59:13.0888559Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.0888929Z Traceback (most recent call last):
2025-12-04T09:59:13.0889625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0890349Z     getattr(self, test_name)()
2025-12-04T09:59:13.0891020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0891701Z     fn()
2025-12-04T09:59:13.0892275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0892932Z     method(*args, **kwargs)
2025-12-04T09:59:13.0893564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0894230Z     method(*args, **kwargs)
2025-12-04T09:59:13.0894848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0895545Z     with policy():
2025-12-04T09:59:13.0896150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0897109Z     raise RuntimeError(msg)
2025-12-04T09:59:13.0898501Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.0899876Z 
2025-12-04T09:59:13.0900092Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0901105Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0901905Z 
2025-12-04T09:59:13.0902179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0902755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.0903251Z ======================= 1 failed, 26 deselected in 9.62s =======================
2025-12-04T09:59:13.0903668Z Got exit code 1
2025-12-04T09:59:13.0904413Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda
2025-12-04T09:59:13.0905523Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.0906696Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml
2025-12-04T09:59:13.0907616Z ============================= test session starts ==============================
2025-12-04T09:59:13.0908264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.0908847Z cachedir: .pytest_cache
2025-12-04T09:59:13.0909630Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.0910356Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.0910673Z configfile: pytest.ini
2025-12-04T09:59:13.0911382Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.0912216Z collecting ... collected 60 items / 1 deselected / 59 selected
2025-12-04T09:59:13.0912669Z stepcurrent: skipping 1 already run items.
2025-12-04T09:59:13.0913014Z Running 26 items in this shard
2025-12-04T09:59:13.0913217Z 
2025-12-04T09:59:13.0914197Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:31.034000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42607
2025-12-04T09:59:13.0915755Z I1204 09:29:31.035000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42608
2025-12-04T09:59:13.0916846Z I1204 09:29:31.035000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42609
2025-12-04T09:59:13.0917892Z I1204 09:29:31.036000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42610
2025-12-04T09:59:13.0919654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0921488Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0922964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0924523Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0926490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0928539Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0930547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0932550Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0934221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0935546Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0937617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0939619Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0941157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.0942710Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.0944666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.0946666Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.0947425Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0948673Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0950422Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0951982Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0953524Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0954960Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0956540Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0958123Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0959706Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0961248Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0962871Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0964334Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0965793Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0967294Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.0969445Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848.
2025-12-04T09:59:13.0971565Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0972602Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.0974316Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.0975751Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.0977098Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.0978507Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.0979647Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.0980800Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.0982476Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.0984116Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.0985753Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.0987298Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.0988920Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0990503Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0991913Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.0993326Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.0994723Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.0996095Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.0997474Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.0998892Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1000921Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 611254272 and is now 634322944.
2025-12-04T09:59:13.1002827Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1003898Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1005601Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1007031Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1008114Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1009381Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1010390Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1011389Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1012876Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1014330Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1015773Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1017475Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1019009Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1020594Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1022402Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1023992Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1025587Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1027142Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1028694Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1030281Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1032633Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1034680Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1035707Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1037403Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1038827Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1039940Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1041188Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1042192Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1043180Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1044648Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1046143Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1047599Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1048980Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1050311Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1051711Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1053116Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1054526Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1055934Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1057633Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1059179Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1060774Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1063092Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1065229Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1066383Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1068292Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1069933Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1071021Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1072258Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1072939Z dist init r=1, world=4
2025-12-04T09:59:13.1073179Z dist init r=0, world=4
2025-12-04T09:59:13.1073412Z dist init r=2, world=4
2025-12-04T09:59:13.1073636Z dist init r=3, world=4
2025-12-04T09:59:13.1074807Z [rank0]:[W1204 09:29:38.912336790 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1076068Z FAILED [9.1848s] [  3%]
2025-12-04T09:59:13.1076221Z 
2025-12-04T09:59:13.1076358Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1077464Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____
2025-12-04T09:59:13.1077954Z Traceback (most recent call last):
2025-12-04T09:59:13.1078649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1079349Z     self._join_processes(fn)
2025-12-04T09:59:13.1080039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1080794Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1081570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1082319Z     raise RuntimeError(error)
2025-12-04T09:59:13.1082700Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.1083126Z Traceback (most recent call last):
2025-12-04T09:59:13.1083814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1084506Z     getattr(self, test_name)()
2025-12-04T09:59:13.1085156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1085830Z     fn()
2025-12-04T09:59:13.1086394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1087047Z     method(*args, **kwargs)
2025-12-04T09:59:13.1087667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1088328Z     method(*args, **kwargs)
2025-12-04T09:59:13.1088939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1089626Z     with policy():
2025-12-04T09:59:13.1090230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1090901Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1092137Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1093316Z 
2025-12-04T09:59:13.1093504Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1094398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1095099Z 
2025-12-04T09:59:13.1095385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1095738Z 
2025-12-04T09:59:13.1095745Z 
2025-12-04T09:59:13.1095942Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1096567Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1097927Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml -
2025-12-04T09:59:13.1099030Z =========================== short test summary info ============================
2025-12-04T09:59:13.1100156Z FAILED [9.1848s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.1101287Z Traceback (most recent call last):
2025-12-04T09:59:13.1102069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1102865Z     getattr(self, test_name)()
2025-12-04T09:59:13.1103629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1104386Z     fn()
2025-12-04T09:59:13.1105019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1105769Z     method(*args, **kwargs)
2025-12-04T09:59:13.1106466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1107211Z     method(*args, **kwargs)
2025-12-04T09:59:13.1107915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1108647Z     with policy():
2025-12-04T09:59:13.1109388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1110063Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1111299Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1112472Z 
2025-12-04T09:59:13.1112658Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1113549Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1114264Z 
2025-12-04T09:59:13.1114498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1115012Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1115472Z ======================= 1 failed, 1 deselected in 9.40s ========================
2025-12-04T09:59:13.1115841Z Got exit code 1
2025-12-04T09:59:13.1116068Z Retrying single test...
2025-12-04T09:59:13.1116774Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml
2025-12-04T09:59:13.1117592Z ============================= test session starts ==============================
2025-12-04T09:59:13.1118160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1118680Z cachedir: .pytest_cache
2025-12-04T09:59:13.1119290Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1119962Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1120262Z configfile: pytest.ini
2025-12-04T09:59:13.1121081Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1122152Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.1123242Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1124213Z Running 1 items in this shard
2025-12-04T09:59:13.1124422Z 
2025-12-04T09:59:13.1125455Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:44.923000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42944
2025-12-04T09:59:13.1127094Z I1204 09:29:44.924000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42945
2025-12-04T09:59:13.1128281Z I1204 09:29:44.925000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42946
2025-12-04T09:59:13.1129397Z I1204 09:29:44.926000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42947
2025-12-04T09:59:13.1131313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1132801Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1134734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1136586Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1138295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1139781Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1141727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1143726Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1145304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1146791Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1148847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1150734Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1152126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1153439Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1155165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1156933Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1157589Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1158617Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1160100Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1161587Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1163035Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1164381Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1165707Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1167113Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1168529Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1169938Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1171347Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1172709Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1174112Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1175540Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1177918Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 711917568 and is now 743374848.
2025-12-04T09:59:13.1180055Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1181251Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1183178Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1184787Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1185999Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1187383Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1188553Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1189704Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1191213Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1192658Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1194113Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1195463Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1196796Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1198197Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1199591Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1201005Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1202408Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1203801Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1205177Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1206582Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1208597Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 600768512 and is now 634322944.
2025-12-04T09:59:13.1210530Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1211568Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1213260Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1214684Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1215759Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1217402Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1218535Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1219689Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1221608Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1223253Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1224898Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1226423Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1227912Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1229496Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1231082Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1232659Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1234260Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1235626Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1237009Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1238430Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1240495Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1242396Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1243423Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1245126Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1246598Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1247689Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1248936Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1249977Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1250975Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1252462Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1253924Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1255373Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1256985Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1258489Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1260071Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1261661Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1263272Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1264875Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1266421Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1267973Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1269601Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1271695Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1273594Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1274622Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1276321Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1277787Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1278865Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1280138Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1280834Z dist init r=2, world=4
2025-12-04T09:59:13.1281079Z dist init r=3, world=4
2025-12-04T09:59:13.1281312Z dist init r=1, world=4
2025-12-04T09:59:13.1281551Z dist init r=0, world=4
2025-12-04T09:59:13.1282721Z [rank0]:[W1204 09:29:52.879508683 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1283936Z FAILED [9.4296s] [100%]
2025-12-04T09:59:13.1284098Z 
2025-12-04T09:59:13.1284230Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1284933Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____
2025-12-04T09:59:13.1285454Z Traceback (most recent call last):
2025-12-04T09:59:13.1286187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1286928Z     self._join_processes(fn)
2025-12-04T09:59:13.1287671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1288476Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1289291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1290094Z     raise RuntimeError(error)
2025-12-04T09:59:13.1290517Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.1290995Z Traceback (most recent call last):
2025-12-04T09:59:13.1291729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1292470Z     getattr(self, test_name)()
2025-12-04T09:59:13.1293167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1293876Z     fn()
2025-12-04T09:59:13.1294476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1295362Z     method(*args, **kwargs)
2025-12-04T09:59:13.1296040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1296862Z     method(*args, **kwargs)
2025-12-04T09:59:13.1297786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1298539Z     with policy():
2025-12-04T09:59:13.1299208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1299968Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1301367Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1302699Z 
2025-12-04T09:59:13.1302921Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1303954Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1304767Z 
2025-12-04T09:59:13.1305033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1305472Z 
2025-12-04T09:59:13.1305637Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.1306050Z Traceback (most recent call last):
2025-12-04T09:59:13.1306820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1307616Z     getattr(self, test_name)()
2025-12-04T09:59:13.1308361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1309243Z     fn()
2025-12-04T09:59:13.1309862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1310708Z     method(*args, **kwargs)
2025-12-04T09:59:13.1311382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1312278Z     method(*args, **kwargs)
2025-12-04T09:59:13.1312967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1313685Z     with policy():
2025-12-04T09:59:13.1314343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1315079Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1316432Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1317710Z 
2025-12-04T09:59:13.1317929Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1318967Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1319745Z 
2025-12-04T09:59:13.1319996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1320392Z 
2025-12-04T09:59:13.1320397Z 
2025-12-04T09:59:13.1320611Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1321677Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1322887Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml -
2025-12-04T09:59:13.1323992Z =========================== short test summary info ============================
2025-12-04T09:59:13.1325207Z FAILED [9.4296s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.1326280Z Traceback (most recent call last):
2025-12-04T09:59:13.1327065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1327847Z     getattr(self, test_name)()
2025-12-04T09:59:13.1328593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1329360Z     fn()
2025-12-04T09:59:13.1329993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1330788Z     method(*args, **kwargs)
2025-12-04T09:59:13.1331625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1332495Z     method(*args, **kwargs)
2025-12-04T09:59:13.1333577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1334619Z     with policy():
2025-12-04T09:59:13.1335317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1336173Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1337914Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1339301Z 
2025-12-04T09:59:13.1339644Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1340756Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1341695Z 
2025-12-04T09:59:13.1342004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1342479Z 
2025-12-04T09:59:13.1342718Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.1343276Z Traceback (most recent call last):
2025-12-04T09:59:13.1344146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1345069Z     getattr(self, test_name)()
2025-12-04T09:59:13.1345959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1346857Z     fn()
2025-12-04T09:59:13.1347584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1348483Z     method(*args, **kwargs)
2025-12-04T09:59:13.1349396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1350213Z     method(*args, **kwargs)
2025-12-04T09:59:13.1351012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1351795Z     with policy():
2025-12-04T09:59:13.1352538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1353286Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1354635Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1355906Z 
2025-12-04T09:59:13.1356148Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1357201Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1357948Z 
2025-12-04T09:59:13.1358244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1358885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1359440Z ======================= 1 failed, 26 deselected in 9.65s =======================
2025-12-04T09:59:13.1359898Z Got exit code 1
2025-12-04T09:59:13.1360349Z Retrying single test...
2025-12-04T09:59:13.1361139Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml
2025-12-04T09:59:13.1362051Z ============================= test session starts ==============================
2025-12-04T09:59:13.1362843Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1363430Z cachedir: .pytest_cache
2025-12-04T09:59:13.1364156Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1365047Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1365462Z configfile: pytest.ini
2025-12-04T09:59:13.1366152Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1367103Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.1368193Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1369196Z Running 1 items in this shard
2025-12-04T09:59:13.1369480Z 
2025-12-04T09:59:13.1370441Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:58.863000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43281
2025-12-04T09:59:13.1372050Z I1204 09:29:58.864000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43282
2025-12-04T09:59:13.1373181Z I1204 09:29:58.865000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43283
2025-12-04T09:59:13.1374308Z I1204 09:29:58.866000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43284
2025-12-04T09:59:13.1376099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1377912Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1380082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1382240Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1383940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1385573Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1387123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1388863Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1390857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1392731Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1394267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1395689Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1397518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1399498Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1401382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1403289Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1404089Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1405420Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1407139Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1409048Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1410753Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1412397Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1413958Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1415633Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1417610Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1419325Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1421341Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1423049Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1424790Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1426493Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1428971Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 707723264 and is now 743374848.
2025-12-04T09:59:13.1431305Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1432678Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1434704Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1436254Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1437473Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1438862Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1439939Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1441060Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1442734Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1444308Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1445916Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1447375Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1448827Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1450415Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1451934Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1453481Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1455030Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1456611Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1458470Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1460309Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1462770Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944.
2025-12-04T09:59:13.1465082Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1466341Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1468386Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1470229Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1471452Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1472767Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1473891Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1475021Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1476620Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1478277Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1479838Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1481312Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1482787Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1484322Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1485853Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1487384Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1488910Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1490370Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1491913Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1493524Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1495687Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 609157120 and is now 634322944.
2025-12-04T09:59:13.1498113Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1499398Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1501482Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1503258Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1504620Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1506115Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1507389Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1508757Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1510527Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1512203Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1513731Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1515235Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1516685Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1518221Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1519823Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1521672Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1523463Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1525201Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1526955Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1528687Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1531177Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1533445Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1534722Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1536665Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1538585Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1539903Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1541447Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1542364Z dist init r=0, world=4
2025-12-04T09:59:13.1542811Z dist init r=1, world=4
2025-12-04T09:59:13.1543154Z dist init r=3, world=4
2025-12-04T09:59:13.1543544Z dist init r=2, world=4
2025-12-04T09:59:13.1545160Z [rank0]:[W1204 09:30:06.755426709 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1546680Z FAILED [9.2234s] [100%]
2025-12-04T09:59:13.1546897Z 
2025-12-04T09:59:13.1547101Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1547848Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____
2025-12-04T09:59:13.1548519Z Traceback (most recent call last):
2025-12-04T09:59:13.1549558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1550421Z     self._join_processes(fn)
2025-12-04T09:59:13.1551241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1552149Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1553050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1553939Z     raise RuntimeError(error)
2025-12-04T09:59:13.1554426Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.1555066Z Traceback (most recent call last):
2025-12-04T09:59:13.1555836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1556633Z     getattr(self, test_name)()
2025-12-04T09:59:13.1557453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1558288Z     fn()
2025-12-04T09:59:13.1558902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1559739Z     method(*args, **kwargs)
2025-12-04T09:59:13.1560479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1561223Z     method(*args, **kwargs)
2025-12-04T09:59:13.1562014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1562187Z     with policy():
2025-12-04T09:59:13.1562680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1562867Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1564020Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1564029Z 
2025-12-04T09:59:13.1564288Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1564969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1564977Z 
2025-12-04T09:59:13.1565252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1565257Z 
2025-12-04T09:59:13.1565262Z 
2025-12-04T09:59:13.1565541Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1565791Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1618870Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml -
2025-12-04T09:59:13.1619172Z =========================== short test summary info ============================
2025-12-04T09:59:13.1620136Z FAILED [9.2234s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.1620269Z Traceback (most recent call last):
2025-12-04T09:59:13.1621066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1621189Z     getattr(self, test_name)()
2025-12-04T09:59:13.1621751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1621844Z     fn()
2025-12-04T09:59:13.1622357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1622481Z     method(*args, **kwargs)
2025-12-04T09:59:13.1622988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1623194Z     method(*args, **kwargs)
2025-12-04T09:59:13.1623713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1623815Z     with policy():
2025-12-04T09:59:13.1624334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1624444Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1625642Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944.
2025-12-04T09:59:13.1625703Z 
2025-12-04T09:59:13.1625920Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1626586Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1626595Z 
2025-12-04T09:59:13.1626915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1627099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1627281Z ======================= 1 failed, 26 deselected in 9.44s =======================
2025-12-04T09:59:13.1627376Z Got exit code 1
2025-12-04T09:59:13.1627969Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda
2025-12-04T09:59:13.1628380Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.1628997Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml
2025-12-04T09:59:13.1629162Z ============================= test session starts ==============================
2025-12-04T09:59:13.1629530Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1629642Z cachedir: .pytest_cache
2025-12-04T09:59:13.1630164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1630289Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1630396Z configfile: pytest.ini
2025-12-04T09:59:13.1630939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1631154Z collecting ... collected 60 items / 2 deselected / 58 selected
2025-12-04T09:59:13.1631294Z stepcurrent: skipping 2 already run items.
2025-12-04T09:59:13.1631416Z Running 25 items in this shard
2025-12-04T09:59:13.1631422Z 
2025-12-04T09:59:13.1632632Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:12.774000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43618
2025-12-04T09:59:13.1633219Z I1204 09:30:12.774000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43619
2025-12-04T09:59:13.1633661Z I1204 09:30:12.775000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43620
2025-12-04T09:59:13.1634104Z I1204 09:30:12.776000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43621
2025-12-04T09:59:13.1635207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1635324Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1636247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1636408Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1636704Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1636894Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1638412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1638605Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1639697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1639853Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1640958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1641078Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1641959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1642118Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1642411Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1642605Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1644140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1644288Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1645167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1645331Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1645661Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1645865Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1647380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1647534Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1648655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1648773Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1649674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1649829Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1650120Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1650309Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1651827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1652035Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1652445Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1652933Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1653822Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1654274Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1655165Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1655519Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1656468Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1657107Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1658085Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1658612Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1659572Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1660035Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1661003Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1661509Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1663269Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088.
2025-12-04T09:59:13.1663655Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1664315Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1665520Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1665917Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1666635Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1667223Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1667671Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1668211Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1669291Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1669748Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1670629Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1670982Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1671843Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1672279Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1673167Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1673599Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1674452Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1674861Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1675715Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1676194Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1677719Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 611254272 and is now 623837184.
2025-12-04T09:59:13.1678049Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1678633Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1679726Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1680081Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1680719Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1681212Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1681611Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1682096Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1682990Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1683444Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1684331Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1684682Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1685552Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1686005Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1687068Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1687525Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1688423Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1688858Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1689797Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1690271Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1691887Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:59:13.1692268Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1692893Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1694059Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1694400Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1695071Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1695590Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1696017Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1696608Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1697781Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1698290Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1699294Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1699696Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1700704Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1701193Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1702161Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1702648Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1703638Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1704097Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1705058Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1705560Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1707280Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:59:13.1707686Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1708370Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1709714Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1710040Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1710676Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1711170Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1711265Z dist init r=1, world=4
2025-12-04T09:59:13.1711362Z dist init r=2, world=4
2025-12-04T09:59:13.1711448Z dist init r=3, world=4
2025-12-04T09:59:13.1711535Z dist init r=0, world=4
2025-12-04T09:59:13.1712573Z [rank0]:[W1204 09:30:19.609914518 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1712664Z FAILED [8.5769s] [  4%]
2025-12-04T09:59:13.1712669Z 
2025-12-04T09:59:13.1712809Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1713134Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _
2025-12-04T09:59:13.1713245Z Traceback (most recent call last):
2025-12-04T09:59:13.1713774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1713877Z     self._join_processes(fn)
2025-12-04T09:59:13.1714400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1714533Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1715075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1715191Z     raise RuntimeError(error)
2025-12-04T09:59:13.1715400Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.1715511Z Traceback (most recent call last):
2025-12-04T09:59:13.1715999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1716123Z     getattr(self, test_name)()
2025-12-04T09:59:13.1716600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1716691Z     fn()
2025-12-04T09:59:13.1717142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1717246Z     method(*args, **kwargs)
2025-12-04T09:59:13.1717887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1717991Z     method(*args, **kwargs)
2025-12-04T09:59:13.1718471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1718597Z     with policy():
2025-12-04T09:59:13.1719078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1719187Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1720377Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:59:13.1720412Z 
2025-12-04T09:59:13.1720624Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1721697Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1721705Z 
2025-12-04T09:59:13.1721985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1721996Z 
2025-12-04T09:59:13.1722000Z 
2025-12-04T09:59:13.1722295Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1722565Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1723365Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml -
2025-12-04T09:59:13.1723542Z =========================== short test summary info ============================
2025-12-04T09:59:13.1724446Z FAILED [8.5769s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.1724573Z Traceback (most recent call last):
2025-12-04T09:59:13.1725120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1725237Z     getattr(self, test_name)()
2025-12-04T09:59:13.1725772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1725860Z     fn()
2025-12-04T09:59:13.1726436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1726543Z     method(*args, **kwargs)
2025-12-04T09:59:13.1727045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1727155Z     method(*args, **kwargs)
2025-12-04T09:59:13.1727659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1727759Z     with policy():
2025-12-04T09:59:13.1728269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1728377Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1729693Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:59:13.1729703Z 
2025-12-04T09:59:13.1729915Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1730664Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1730669Z 
2025-12-04T09:59:13.1730933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1731106Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1731324Z ======================= 1 failed, 2 deselected in 8.80s ========================
2025-12-04T09:59:13.1731420Z Got exit code 1
2025-12-04T09:59:13.1731530Z Retrying single test...
2025-12-04T09:59:13.1732149Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml
2025-12-04T09:59:13.1732346Z ============================= test session starts ==============================
2025-12-04T09:59:13.1732703Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1732810Z cachedir: .pytest_cache
2025-12-04T09:59:13.1733331Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1733452Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1733554Z configfile: pytest.ini
2025-12-04T09:59:13.1734197Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1734409Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.1735209Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1735329Z Running 1 items in this shard
2025-12-04T09:59:13.1735334Z 
2025-12-04T09:59:13.1736475Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:26.204000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43939
2025-12-04T09:59:13.1737152Z I1204 09:30:26.204000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43940
2025-12-04T09:59:13.1737647Z I1204 09:30:26.205000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43941
2025-12-04T09:59:13.1738149Z I1204 09:30:26.206000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43942
2025-12-04T09:59:13.1739433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1739561Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1740569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1740740Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1741067Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1741285Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1743037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1743216Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1744442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1744580Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1746130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1746304Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1746634Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1746877Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1748814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1749084Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1750201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1750314Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1751195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1751353Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1751633Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1751827Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1753347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1753524Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1754630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1754738Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1755636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1755787Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1756076Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1756287Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1757813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1757969Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1758375Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1758852Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1759770Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1760249Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1761145Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1761500Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1762357Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1762795Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1763656Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1764095Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1764946Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1765347Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1766231Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1766678Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1768206Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088.
2025-12-04T09:59:13.1768537Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1769153Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1770218Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1770542Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1771177Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1771667Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1772091Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1772577Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1773485Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1773933Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1774817Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1775172Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1776035Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1776540Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1777671Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1778156Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1779114Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1779608Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1780576Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1781080Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1782796Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184.
2025-12-04T09:59:13.1783200Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1783867Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1785063Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1785425Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1786135Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1786712Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1787162Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1787724Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1788833Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1789407Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1790296Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1790649Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1791506Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1791936Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1792791Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1793220Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1794108Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1794512Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1795361Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1795801Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1797364Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184.
2025-12-04T09:59:13.1797696Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1798280Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1799340Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1799687Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1800325Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1800838Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1801237Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1801713Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1802594Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1803044Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1803929Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1804281Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1805138Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1805567Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1806432Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1806883Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1807732Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1808131Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1808983Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1809426Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1810971Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:59:13.1811307Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1811890Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1812952Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1813309Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1813966Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1814454Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1814541Z dist init r=0, world=4
2025-12-04T09:59:13.1814626Z dist init r=3, world=4
2025-12-04T09:59:13.1814719Z dist init r=2, world=4
2025-12-04T09:59:13.1814805Z dist init r=1, world=4
2025-12-04T09:59:13.1815841Z [rank0]:[W1204 09:30:33.010469243 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1815930Z FAILED [8.5029s] [100%]
2025-12-04T09:59:13.1815938Z 
2025-12-04T09:59:13.1816067Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1816463Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _
2025-12-04T09:59:13.1816571Z Traceback (most recent call last):
2025-12-04T09:59:13.1817284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1817426Z     self._join_processes(fn)
2025-12-04T09:59:13.1818008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1818160Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1818767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1818890Z     raise RuntimeError(error)
2025-12-04T09:59:13.1819162Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.1819282Z Traceback (most recent call last):
2025-12-04T09:59:13.1819823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1819942Z     getattr(self, test_name)()
2025-12-04T09:59:13.1820476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1820563Z     fn()
2025-12-04T09:59:13.1821321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1821439Z     method(*args, **kwargs)
2025-12-04T09:59:13.1821950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1822122Z     method(*args, **kwargs)
2025-12-04T09:59:13.1822626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1822733Z     with policy():
2025-12-04T09:59:13.1823237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1823344Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1824625Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088.
2025-12-04T09:59:13.1824673Z 
2025-12-04T09:59:13.1824888Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1825647Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1825655Z 
2025-12-04T09:59:13.1825956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1825963Z 
2025-12-04T09:59:13.1825967Z 
2025-12-04T09:59:13.1826193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1826458Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1827253Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml -
2025-12-04T09:59:13.1827428Z =========================== short test summary info ============================
2025-12-04T09:59:13.1828332Z FAILED [8.5029s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.1828459Z Traceback (most recent call last):
2025-12-04T09:59:13.1829009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1829124Z     getattr(self, test_name)()
2025-12-04T09:59:13.1829668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1829754Z     fn()
2025-12-04T09:59:13.1830270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1830373Z     method(*args, **kwargs)
2025-12-04T09:59:13.1830881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1830993Z     method(*args, **kwargs)
2025-12-04T09:59:13.1831499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1831592Z     with policy():
2025-12-04T09:59:13.1832151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1832268Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1833619Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088.
2025-12-04T09:59:13.1833625Z 
2025-12-04T09:59:13.1833826Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1834522Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1834537Z 
2025-12-04T09:59:13.1834820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1834991Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1835161Z ======================= 1 failed, 26 deselected in 8.72s =======================
2025-12-04T09:59:13.1835252Z Got exit code 1
2025-12-04T09:59:13.1835349Z Retrying single test...
2025-12-04T09:59:13.1835943Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml
2025-12-04T09:59:13.1836092Z ============================= test session starts ==============================
2025-12-04T09:59:13.1836593Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1836727Z cachedir: .pytest_cache
2025-12-04T09:59:13.1837220Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1837345Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1837446Z configfile: pytest.ini
2025-12-04T09:59:13.1837960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1838255Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.1839048Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1839163Z Running 1 items in this shard
2025-12-04T09:59:13.1839168Z 
2025-12-04T09:59:13.1840228Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:39.564000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44260
2025-12-04T09:59:13.1840718Z I1204 09:30:39.565000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44261
2025-12-04T09:59:13.1841197Z I1204 09:30:39.565000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44262
2025-12-04T09:59:13.1841772Z I1204 09:30:39.566000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44263
2025-12-04T09:59:13.1842959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1843075Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1844243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1844403Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1845339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1845510Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1845809Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1846016Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1846949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1847112Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1847440Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1847641Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1849260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1849415Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1851018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1851199Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1852396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1852522Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1853453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1853622Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1853922Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1854120Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1855742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1855897Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1857339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.1857473Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.1858514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.1858687Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.1859004Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.1859227Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.1860931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.1861108Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.1861595Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1862139Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1863149Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1863661Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1864652Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1865078Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1866051Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1866566Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1867528Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1868020Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1869183Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1869588Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1870441Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1870886Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1872418Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088.
2025-12-04T09:59:13.1872769Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1873366Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1874434Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1874760Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1875428Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1875922Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1876322Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1876796Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1877692Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1878167Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1879056Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1879431Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1880292Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1880717Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1881568Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1882009Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1882858Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1883265Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1884121Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1884561Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1886107Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 609157120 and is now 623837184.
2025-12-04T09:59:13.1886435Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1887024Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1888082Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1888438Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1889074Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1889567Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.1889961Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1890429Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1891323Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1891796Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1892713Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1893059Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1893916Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1894346Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1895198Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1895642Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1896566Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1897179Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1898144Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1898682Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1900399Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184.
2025-12-04T09:59:13.1900764Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1901432Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1902661Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1903031Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1903747Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1904297Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1904747Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1905305Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1906315Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1906846Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1907838Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1908234Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1909305Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1909766Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1910663Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1911129Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1912025Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1912455Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1913384Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1913856Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1915468Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184.
2025-12-04T09:59:13.1915808Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1916462Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1917772Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1918127Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1918817Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1919382Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1919479Z dist init r=1, world=4
2025-12-04T09:59:13.1919575Z dist init r=2, world=4
2025-12-04T09:59:13.1919674Z dist init r=0, world=4
2025-12-04T09:59:13.1919850Z dist init r=3, world=4
2025-12-04T09:59:13.1921493Z [rank0]:[W1204 09:30:46.287123994 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.1921614Z FAILED [8.6814s] [100%]
2025-12-04T09:59:13.1921620Z 
2025-12-04T09:59:13.1921767Z =================================== FAILURES ===================================
2025-12-04T09:59:13.1922129Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _
2025-12-04T09:59:13.1922247Z Traceback (most recent call last):
2025-12-04T09:59:13.1922800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.1922925Z     self._join_processes(fn)
2025-12-04T09:59:13.1923518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.1923667Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.1924272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.1924383Z     raise RuntimeError(error)
2025-12-04T09:59:13.1924623Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.1924743Z Traceback (most recent call last):
2025-12-04T09:59:13.1925281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1925401Z     getattr(self, test_name)()
2025-12-04T09:59:13.1925933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1926030Z     fn()
2025-12-04T09:59:13.1926612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1926723Z     method(*args, **kwargs)
2025-12-04T09:59:13.1927241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1927346Z     method(*args, **kwargs)
2025-12-04T09:59:13.1927852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1927955Z     with policy():
2025-12-04T09:59:13.1928469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1928591Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1929901Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088.
2025-12-04T09:59:13.1929911Z 
2025-12-04T09:59:13.1930125Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1930870Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1930876Z 
2025-12-04T09:59:13.1931145Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1931150Z 
2025-12-04T09:59:13.1931155Z 
2025-12-04T09:59:13.1931385Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.1931688Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.1932499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml -
2025-12-04T09:59:13.1932672Z =========================== short test summary info ============================
2025-12-04T09:59:13.1933740Z FAILED [8.6814s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.1933868Z Traceback (most recent call last):
2025-12-04T09:59:13.1934518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1934636Z     getattr(self, test_name)()
2025-12-04T09:59:13.1935139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1935226Z     fn()
2025-12-04T09:59:13.1935713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1935814Z     method(*args, **kwargs)
2025-12-04T09:59:13.1936352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1936479Z     method(*args, **kwargs)
2025-12-04T09:59:13.1937141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1937246Z     with policy():
2025-12-04T09:59:13.1937749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1937856Z     raise RuntimeError(msg)
2025-12-04T09:59:13.1939134Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088.
2025-12-04T09:59:13.1939144Z 
2025-12-04T09:59:13.1939394Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1940152Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1940158Z 
2025-12-04T09:59:13.1940420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1940596Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.1940779Z ======================= 1 failed, 26 deselected in 8.90s =======================
2025-12-04T09:59:13.1940874Z Got exit code 1
2025-12-04T09:59:13.1941544Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda
2025-12-04T09:59:13.1941980Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.1942602Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml
2025-12-04T09:59:13.1942769Z ============================= test session starts ==============================
2025-12-04T09:59:13.1943119Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.1943234Z cachedir: .pytest_cache
2025-12-04T09:59:13.1943745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.1943863Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.1943973Z configfile: pytest.ini
2025-12-04T09:59:13.1944505Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.1944761Z collecting ... collected 60 items / 3 deselected / 57 selected
2025-12-04T09:59:13.1944909Z stepcurrent: skipping 3 already run items.
2025-12-04T09:59:13.1945020Z Running 24 items in this shard
2025-12-04T09:59:13.1945054Z 
2025-12-04T09:59:13.1946102Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:30:53.004000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44581
2025-12-04T09:59:13.1946596Z I1204 09:30:53.005000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44582
2025-12-04T09:59:13.1947086Z I1204 09:30:53.005000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44583
2025-12-04T09:59:13.1947586Z I1204 09:30:53.006000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44584
2025-12-04T09:59:13.1949758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.1949860Z   _warn_cpu_init()
2025-12-04T09:59:13.1951652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.1951748Z   _warn_cpu_init()
2025-12-04T09:59:13.1953553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.1953650Z   _warn_cpu_init()
2025-12-04T09:59:13.1955435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.1955555Z   _warn_cpu_init()
2025-12-04T09:59:13.1956439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.1956539Z   return func(*args, **kwargs)
2025-12-04T09:59:13.1956953Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1957423Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1958317Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1958793Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1959666Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1960056Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1960909Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1961342Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1962193Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1962629Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1963477Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1963877Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1964738Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1965179Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1966685Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.1967014Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1967603Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1968624Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.1968947Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1969586Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1970069Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.1970478Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1970944Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1971863Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1972316Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1973213Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1973572Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1974421Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1974865Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1975712Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1976155Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1977307Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1977751Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1978730Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1979254Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1981007Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.1981374Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1982045Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1983202Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.1983568Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1984296Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1984841Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.1985297Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1985858Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.1986869Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.1987404Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.1988393Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.1988797Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.1989782Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1990224Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1991071Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.1991505Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.1992353Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.1992750Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.1993636Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.1994076Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.1995546Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.1995870Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1996497Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.1997499Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.1997823Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.1998463Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.1998968Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.1999375Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.1999842Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2000761Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2001207Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2002082Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2002450Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2003303Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2003743Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2004589Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2005031Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2005908Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2006308Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2007173Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2007606Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2009103Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2009429Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2010020Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2011010Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2011329Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2011996Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2012484Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2012610Z dist init r=1, world=4
2025-12-04T09:59:13.2012696Z dist init r=0, world=4
2025-12-04T09:59:13.2012781Z dist init r=3, world=4
2025-12-04T09:59:13.2012872Z dist init r=2, world=4
2025-12-04T09:59:13.2013900Z [rank0]:[W1204 09:31:21.263361127 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2014000Z FAILED [29.8483s] [  4%]
2025-12-04T09:59:13.2014005Z 
2025-12-04T09:59:13.2014134Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2014410Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____
2025-12-04T09:59:13.2014524Z Traceback (most recent call last):
2025-12-04T09:59:13.2015011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2015111Z     self._join_processes(fn)
2025-12-04T09:59:13.2015636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2015759Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2016361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2016473Z     raise RuntimeError(error)
2025-12-04T09:59:13.2016683Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2016809Z Traceback (most recent call last):
2025-12-04T09:59:13.2017518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2017632Z     getattr(self, test_name)()
2025-12-04T09:59:13.2018215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2018307Z     fn()
2025-12-04T09:59:13.2018825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2018928Z     method(*args, **kwargs)
2025-12-04T09:59:13.2019431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2019537Z     method(*args, **kwargs)
2025-12-04T09:59:13.2020040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2020146Z     with policy():
2025-12-04T09:59:13.2020651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2021010Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2022234Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2022243Z 
2025-12-04T09:59:13.2022461Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2023149Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2023156Z 
2025-12-04T09:59:13.2023421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2023467Z 
2025-12-04T09:59:13.2023471Z 
2025-12-04T09:59:13.2023691Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2023967Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2024770Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml -
2025-12-04T09:59:13.2024983Z =========================== short test summary info ============================
2025-12-04T09:59:13.2025821Z FAILED [29.8483s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2025942Z Traceback (most recent call last):
2025-12-04T09:59:13.2026499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2026614Z     getattr(self, test_name)()
2025-12-04T09:59:13.2027160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2027250Z     fn()
2025-12-04T09:59:13.2027759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2027875Z     method(*args, **kwargs)
2025-12-04T09:59:13.2028383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2028486Z     method(*args, **kwargs)
2025-12-04T09:59:13.2028996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2029095Z     with policy():
2025-12-04T09:59:13.2029606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2029716Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2030958Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2030972Z 
2025-12-04T09:59:13.2031185Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2031863Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2031869Z 
2025-12-04T09:59:13.2032136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2032316Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2032602Z ======================= 1 failed, 3 deselected in 30.07s =======================
2025-12-04T09:59:13.2032711Z Got exit code 1
2025-12-04T09:59:13.2032924Z Retrying single test...
2025-12-04T09:59:13.2033553Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml
2025-12-04T09:59:13.2033706Z ============================= test session starts ==============================
2025-12-04T09:59:13.2034037Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2034142Z cachedir: .pytest_cache
2025-12-04T09:59:13.2034619Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2034731Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2034834Z configfile: pytest.ini
2025-12-04T09:59:13.2035339Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2035598Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2036311Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2036418Z Running 1 items in this shard
2025-12-04T09:59:13.2036451Z 
2025-12-04T09:59:13.2037418Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:31:27.594000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44918
2025-12-04T09:59:13.2037884Z I1204 09:31:27.595000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44919
2025-12-04T09:59:13.2038353Z I1204 09:31:27.596000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44920
2025-12-04T09:59:13.2038809Z I1204 09:31:27.596000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44921
2025-12-04T09:59:13.2040767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2040857Z   _warn_cpu_init()
2025-12-04T09:59:13.2042645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2042745Z   _warn_cpu_init()
2025-12-04T09:59:13.2044564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2044661Z   _warn_cpu_init()
2025-12-04T09:59:13.2046447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2046567Z   _warn_cpu_init()
2025-12-04T09:59:13.2047456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2047566Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2047972Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2048446Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2049345Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2049884Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2050774Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2051153Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2052014Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2052441Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2053296Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2053736Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2054582Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2054983Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2055837Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2056340Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2058163Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2058531Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2059201Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2060357Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2060730Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2061446Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2061993Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2062441Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2062969Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2064014Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2064549Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2065543Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2065934Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2066908Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2067396Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2068364Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2068964Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2069940Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2070344Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2071227Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2071673Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2073140Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2073461Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2074055Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2075079Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2075412Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2076049Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2076534Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2076958Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2077431Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2078324Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2078793Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2079679Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2080026Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2080880Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2081319Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2082168Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2082604Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2083451Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2083858Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2084733Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2085173Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2086639Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.2086965Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2088028Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2089033Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2089367Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2090000Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2090518Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2090924Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2091443Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2092342Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2092792Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2093674Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2094026Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2094880Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2095318Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2096171Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2096852Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2097859Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2098311Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2099272Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2099761Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2101463Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2101834Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2102502Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2103624Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2103995Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2104741Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2105285Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2105419Z dist init r=1, world=4
2025-12-04T09:59:13.2105519Z dist init r=0, world=4
2025-12-04T09:59:13.2105619Z dist init r=2, world=4
2025-12-04T09:59:13.2105715Z dist init r=3, world=4
2025-12-04T09:59:13.2106863Z [rank0]:[W1204 09:31:54.297324510 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2106971Z FAILED [28.4302s] [100%]
2025-12-04T09:59:13.2106977Z 
2025-12-04T09:59:13.2107127Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2107442Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____
2025-12-04T09:59:13.2107562Z Traceback (most recent call last):
2025-12-04T09:59:13.2108114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2108235Z     self._join_processes(fn)
2025-12-04T09:59:13.2108816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2109061Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2109611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2109714Z     raise RuntimeError(error)
2025-12-04T09:59:13.2109929Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2110037Z Traceback (most recent call last):
2025-12-04T09:59:13.2110520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2110625Z     getattr(self, test_name)()
2025-12-04T09:59:13.2111128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2111208Z     fn()
2025-12-04T09:59:13.2111671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2111764Z     method(*args, **kwargs)
2025-12-04T09:59:13.2112222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2112312Z     method(*args, **kwargs)
2025-12-04T09:59:13.2112757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2112852Z     with policy():
2025-12-04T09:59:13.2113323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2113430Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2114494Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2114502Z 
2025-12-04T09:59:13.2114692Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2115298Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2115303Z 
2025-12-04T09:59:13.2115539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2115571Z 
2025-12-04T09:59:13.2115576Z 
2025-12-04T09:59:13.2115782Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2116016Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2116729Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml -
2025-12-04T09:59:13.2116915Z =========================== short test summary info ============================
2025-12-04T09:59:13.2117653Z FAILED [28.4302s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2117772Z Traceback (most recent call last):
2025-12-04T09:59:13.2118260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2118361Z     getattr(self, test_name)()
2025-12-04T09:59:13.2118848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2118928Z     fn()
2025-12-04T09:59:13.2119393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2119490Z     method(*args, **kwargs)
2025-12-04T09:59:13.2119936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2120041Z     method(*args, **kwargs)
2025-12-04T09:59:13.2120484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2120569Z     with policy():
2025-12-04T09:59:13.2121429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2121544Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2122829Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2122838Z 
2025-12-04T09:59:13.2123057Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2123733Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2123747Z 
2025-12-04T09:59:13.2124015Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2124193Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2124377Z ====================== 1 failed, 26 deselected in 28.65s =======================
2025-12-04T09:59:13.2124476Z Got exit code 1
2025-12-04T09:59:13.2124583Z Retrying single test...
2025-12-04T09:59:13.2125251Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml
2025-12-04T09:59:13.2125415Z ============================= test session starts ==============================
2025-12-04T09:59:13.2125780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2125888Z cachedir: .pytest_cache
2025-12-04T09:59:13.2126407Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2126535Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2126642Z configfile: pytest.ini
2025-12-04T09:59:13.2127179Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2127434Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2128188Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2128346Z Running 1 items in this shard
2025-12-04T09:59:13.2128352Z 
2025-12-04T09:59:13.2129382Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:32:00.694000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45255
2025-12-04T09:59:13.2129888Z I1204 09:32:00.695000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45256
2025-12-04T09:59:13.2130381Z I1204 09:32:00.696000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45257
2025-12-04T09:59:13.2130872Z I1204 09:32:00.697000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45258
2025-12-04T09:59:13.2132917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2133017Z   _warn_cpu_init()
2025-12-04T09:59:13.2134965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2135055Z   _warn_cpu_init()
2025-12-04T09:59:13.2137159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2137261Z   _warn_cpu_init()
2025-12-04T09:59:13.2139306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2139402Z   _warn_cpu_init()
2025-12-04T09:59:13.2140402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2140529Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2140991Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2141536Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2142536Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2143089Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2144090Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2144511Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2145479Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2145965Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2146941Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2147428Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2148381Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2148920Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2149773Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2150221Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2151719Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2152053Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2152633Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2153662Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2153987Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2154624Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2155117Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2155516Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2155989Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2156910Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2157387Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2158266Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2158619Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2159480Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2159920Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2160774Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2161207Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2162056Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2162458Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2163336Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2163783Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2165241Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.2165569Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2166184Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2167191Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2167511Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2168149Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2168635Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2169063Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2169541Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2170433Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2170902Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2171786Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2172139Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2173000Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2173430Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2174281Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2174707Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2175553Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2175961Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2177116Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2177626Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2179285Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2179689Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2180349Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2181480Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2181843Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2182558Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2183146Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2183599Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2184167Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2185171Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2185676Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2186665Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2187068Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2188040Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2188533Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2189512Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2189944Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2190819Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2191219Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2192075Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2192515Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2194030Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2194360Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2194941Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2195935Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2196262Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2196919Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2197406Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2197519Z dist init r=1, world=4
2025-12-04T09:59:13.2197606Z dist init r=2, world=4
2025-12-04T09:59:13.2197699Z dist init r=0, world=4
2025-12-04T09:59:13.2197782Z dist init r=3, world=4
2025-12-04T09:59:13.2198817Z [rank0]:[W1204 09:32:29.277226647 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2198909Z FAILED [30.5436s] [100%]
2025-12-04T09:59:13.2198914Z 
2025-12-04T09:59:13.2199044Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2199321Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____
2025-12-04T09:59:13.2199428Z Traceback (most recent call last):
2025-12-04T09:59:13.2199923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2200024Z     self._join_processes(fn)
2025-12-04T09:59:13.2200547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2200680Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2201217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2201314Z     raise RuntimeError(error)
2025-12-04T09:59:13.2201523Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2201628Z Traceback (most recent call last):
2025-12-04T09:59:13.2202115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2202211Z     getattr(self, test_name)()
2025-12-04T09:59:13.2202708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2202798Z     fn()
2025-12-04T09:59:13.2203252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2203346Z     method(*args, **kwargs)
2025-12-04T09:59:13.2203797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2203891Z     method(*args, **kwargs)
2025-12-04T09:59:13.2204349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2204439Z     with policy():
2025-12-04T09:59:13.2204922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2205025Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2206092Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2206097Z 
2025-12-04T09:59:13.2206293Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2206891Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2206896Z 
2025-12-04T09:59:13.2207129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2207168Z 
2025-12-04T09:59:13.2207172Z 
2025-12-04T09:59:13.2207364Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2207598Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2208340Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml -
2025-12-04T09:59:13.2208491Z =========================== short test summary info ============================
2025-12-04T09:59:13.2209233Z FAILED [30.5436s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2209340Z Traceback (most recent call last):
2025-12-04T09:59:13.2209825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2209932Z     getattr(self, test_name)()
2025-12-04T09:59:13.2210410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2210493Z     fn()
2025-12-04T09:59:13.2210952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2211044Z     method(*args, **kwargs)
2025-12-04T09:59:13.2211500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2211594Z     method(*args, **kwargs)
2025-12-04T09:59:13.2212041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2212134Z     with policy():
2025-12-04T09:59:13.2212583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2212681Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2213794Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912.
2025-12-04T09:59:13.2213802Z 
2025-12-04T09:59:13.2213990Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2214595Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2214600Z 
2025-12-04T09:59:13.2214831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2215001Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2215166Z ====================== 1 failed, 26 deselected in 30.76s =======================
2025-12-04T09:59:13.2215250Z Got exit code 1
2025-12-04T09:59:13.2215814Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda
2025-12-04T09:59:13.2216175Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.2216982Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml
2025-12-04T09:59:13.2217163Z ============================= test session starts ==============================
2025-12-04T09:59:13.2217508Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2217625Z cachedir: .pytest_cache
2025-12-04T09:59:13.2218139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2218299Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2218416Z configfile: pytest.ini
2025-12-04T09:59:13.2218952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2219174Z collecting ... collected 60 items / 4 deselected / 56 selected
2025-12-04T09:59:13.2219344Z stepcurrent: skipping 4 already run items.
2025-12-04T09:59:13.2219456Z Running 23 items in this shard
2025-12-04T09:59:13.2219461Z 
2025-12-04T09:59:13.2220547Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:32:35.974000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45592
2025-12-04T09:59:13.2221269Z I1204 09:32:35.975000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45593
2025-12-04T09:59:13.2221777Z I1204 09:32:35.975000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45594
2025-12-04T09:59:13.2222273Z I1204 09:32:35.976000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45595
2025-12-04T09:59:13.2224306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2224420Z   _warn_cpu_init()
2025-12-04T09:59:13.2226428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2226611Z   _warn_cpu_init()
2025-12-04T09:59:13.2228626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2228733Z   _warn_cpu_init()
2025-12-04T09:59:13.2230776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2230892Z   _warn_cpu_init()
2025-12-04T09:59:13.2231895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2232006Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2232586Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2233112Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2234133Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2234627Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2235632Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2236025Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2236954Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2237440Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2238372Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2238856Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2239787Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2240229Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2241171Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2241689Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2243407Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.2243753Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2244393Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2245521Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2245878Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2246547Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2247059Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2247493Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2248022Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2248975Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2249477Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2250416Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2250794Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2251701Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2252173Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2253081Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2253729Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2254658Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2255105Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2256066Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2256625Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2258493Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2258866Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2259646Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2260808Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2261182Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2261892Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2262467Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2262933Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2263493Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2264510Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2265015Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2266018Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2266423Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2267387Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2267884Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2268842Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2269409Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2270298Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2270705Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2271561Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2271995Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2273525Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.2273854Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2274452Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2275484Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2275816Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2276477Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2276961Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2277393Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2277862Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2278760Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2279211Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2280101Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2280456Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2281304Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2281750Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2282613Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2283053Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2283923Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2284332Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2285181Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2285612Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2287146Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2287472Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2288064Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2289098Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2289457Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2290089Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2290610Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2290711Z dist init r=1, world=4
2025-12-04T09:59:13.2290798Z dist init r=2, world=4
2025-12-04T09:59:13.2290887Z dist init r=3, world=4
2025-12-04T09:59:13.2290981Z dist init r=0, world=4
2025-12-04T09:59:13.2292002Z [rank0]:[W1204 09:33:03.268184081 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2292103Z FAILED [28.9438s] [  4%]
2025-12-04T09:59:13.2292108Z 
2025-12-04T09:59:13.2292239Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2292534Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.2292648Z Traceback (most recent call last):
2025-12-04T09:59:13.2293143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2293249Z     self._join_processes(fn)
2025-12-04T09:59:13.2293766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2293891Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2294436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2294541Z     raise RuntimeError(error)
2025-12-04T09:59:13.2294762Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.2294868Z Traceback (most recent call last):
2025-12-04T09:59:13.2295369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2295478Z     getattr(self, test_name)()
2025-12-04T09:59:13.2295959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2296037Z     fn()
2025-12-04T09:59:13.2296575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2296833Z     method(*args, **kwargs)
2025-12-04T09:59:13.2297358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2297461Z     method(*args, **kwargs)
2025-12-04T09:59:13.2297997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2298106Z     with policy():
2025-12-04T09:59:13.2298617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2298728Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2299975Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.2299981Z 
2025-12-04T09:59:13.2300193Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2300946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2300951Z 
2025-12-04T09:59:13.2301216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2301221Z 
2025-12-04T09:59:13.2301396Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2301549Z Traceback (most recent call last):
2025-12-04T09:59:13.2302100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2302220Z     getattr(self, test_name)()
2025-12-04T09:59:13.2302754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2302841Z     fn()
2025-12-04T09:59:13.2303356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2303460Z     method(*args, **kwargs)
2025-12-04T09:59:13.2303972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2304078Z     method(*args, **kwargs)
2025-12-04T09:59:13.2304580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2304687Z     with policy():
2025-12-04T09:59:13.2305195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2305303Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2306547Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2306555Z 
2025-12-04T09:59:13.2306773Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2307494Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2307527Z 
2025-12-04T09:59:13.2307795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2307801Z 
2025-12-04T09:59:13.2307805Z 
2025-12-04T09:59:13.2308032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2308290Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2309288Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml -
2025-12-04T09:59:13.2309447Z =========================== short test summary info ============================
2025-12-04T09:59:13.2310255Z FAILED [28.9438s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.2310374Z Traceback (most recent call last):
2025-12-04T09:59:13.2310864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2310966Z     getattr(self, test_name)()
2025-12-04T09:59:13.2311457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2311537Z     fn()
2025-12-04T09:59:13.2311995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2312086Z     method(*args, **kwargs)
2025-12-04T09:59:13.2312537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2312666Z     method(*args, **kwargs)
2025-12-04T09:59:13.2313113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2313200Z     with policy():
2025-12-04T09:59:13.2313663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2313790Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2314899Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.2314904Z 
2025-12-04T09:59:13.2315097Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2315727Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2315745Z 
2025-12-04T09:59:13.2315982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2315986Z 
2025-12-04T09:59:13.2316136Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2316255Z Traceback (most recent call last):
2025-12-04T09:59:13.2316743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2316842Z     getattr(self, test_name)()
2025-12-04T09:59:13.2317326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2317405Z     fn()
2025-12-04T09:59:13.2317867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2317962Z     method(*args, **kwargs)
2025-12-04T09:59:13.2318411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2318516Z     method(*args, **kwargs)
2025-12-04T09:59:13.2318993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2319084Z     with policy():
2025-12-04T09:59:13.2319553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2319647Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2320889Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2320899Z 
2025-12-04T09:59:13.2321098Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2322014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2322022Z 
2025-12-04T09:59:13.2322288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2322468Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2322657Z ======================= 1 failed, 4 deselected in 29.17s =======================
2025-12-04T09:59:13.2322756Z Got exit code 1
2025-12-04T09:59:13.2322858Z Retrying single test...
2025-12-04T09:59:13.2323490Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml
2025-12-04T09:59:13.2323652Z ============================= test session starts ==============================
2025-12-04T09:59:13.2324052Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2324160Z cachedir: .pytest_cache
2025-12-04T09:59:13.2324677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2324844Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2324953Z configfile: pytest.ini
2025-12-04T09:59:13.2325492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2325716Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2326510Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2326633Z Running 1 items in this shard
2025-12-04T09:59:13.2326638Z 
2025-12-04T09:59:13.2327708Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:33:09.414000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45929
2025-12-04T09:59:13.2328215Z I1204 09:33:09.415000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45930
2025-12-04T09:59:13.2328711Z I1204 09:33:09.415000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45931
2025-12-04T09:59:13.2329200Z I1204 09:33:09.416000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45932
2025-12-04T09:59:13.2331236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2331337Z   _warn_cpu_init()
2025-12-04T09:59:13.2333391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2333492Z   _warn_cpu_init()
2025-12-04T09:59:13.2335462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2335556Z   _warn_cpu_init()
2025-12-04T09:59:13.2337672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2337772Z   _warn_cpu_init()
2025-12-04T09:59:13.2338774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2338949Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2339414Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2339991Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2340993Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2341509Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2342497Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2342901Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2343870Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2344355Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2345322Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2345808Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2346813Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2347263Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2348224Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2348724Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2350469Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.2350831Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2351451Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2352546Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2352888Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2353589Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2354110Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2354566Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2355070Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2356098Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2356562Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2357443Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2357801Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2358657Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2359090Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2359950Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2360384Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2361272Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2361674Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2362531Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2362976Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2364502Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2364836Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2365419Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2366457Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2366804Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2367437Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2367956Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2368357Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2368835Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2369723Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2370189Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2371064Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2371414Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2372270Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2372701Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2373591Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2374024Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2374872Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2375277Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2376129Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2376680Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2378526Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2378909Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2379569Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2380770Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2381169Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2381882Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2382436Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2382886Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2383422Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2384431Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2384944Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2385939Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2386333Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2387302Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2387817Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2388901Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2389473Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2390321Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2390727Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2391624Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2392068Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2393560Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2393919Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2394507Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2395540Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2395893Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2396525Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2397011Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2397102Z dist init r=0, world=4
2025-12-04T09:59:13.2397199Z dist init r=3, world=4
2025-12-04T09:59:13.2397285Z dist init r=2, world=4
2025-12-04T09:59:13.2397370Z dist init r=1, world=4
2025-12-04T09:59:13.2398405Z [rank0]:[W1204 09:33:35.567190403 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2398495Z FAILED [27.9448s] [100%]
2025-12-04T09:59:13.2398500Z 
2025-12-04T09:59:13.2398631Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2398932Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.2399038Z Traceback (most recent call last):
2025-12-04T09:59:13.2399536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2399637Z     self._join_processes(fn)
2025-12-04T09:59:13.2400159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2400324Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2400866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2400979Z     raise RuntimeError(error)
2025-12-04T09:59:13.2401185Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2401294Z Traceback (most recent call last):
2025-12-04T09:59:13.2401779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2401879Z     getattr(self, test_name)()
2025-12-04T09:59:13.2402355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2402445Z     fn()
2025-12-04T09:59:13.2402919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2403023Z     method(*args, **kwargs)
2025-12-04T09:59:13.2403473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2403569Z     method(*args, **kwargs)
2025-12-04T09:59:13.2404023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2404109Z     with policy():
2025-12-04T09:59:13.2404561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2404664Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2405789Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2405796Z 
2025-12-04T09:59:13.2406031Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2406661Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2406666Z 
2025-12-04T09:59:13.2406906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2406911Z 
2025-12-04T09:59:13.2406915Z 
2025-12-04T09:59:13.2407112Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2407346Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2408077Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml -
2025-12-04T09:59:13.2408232Z =========================== short test summary info ============================
2025-12-04T09:59:13.2409016Z FAILED [27.9448s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2409124Z Traceback (most recent call last):
2025-12-04T09:59:13.2409614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2409722Z     getattr(self, test_name)()
2025-12-04T09:59:13.2410198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2410289Z     fn()
2025-12-04T09:59:13.2410737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2410828Z     method(*args, **kwargs)
2025-12-04T09:59:13.2411312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2411408Z     method(*args, **kwargs)
2025-12-04T09:59:13.2411855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2411956Z     with policy():
2025-12-04T09:59:13.2412406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2412512Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2413607Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2413615Z 
2025-12-04T09:59:13.2413830Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2414480Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2414486Z 
2025-12-04T09:59:13.2414721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2414888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2415047Z ====================== 1 failed, 26 deselected in 28.17s =======================
2025-12-04T09:59:13.2415136Z Got exit code 1
2025-12-04T09:59:13.2415237Z Retrying single test...
2025-12-04T09:59:13.2415786Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml
2025-12-04T09:59:13.2415967Z ============================= test session starts ==============================
2025-12-04T09:59:13.2416336Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2416451Z cachedir: .pytest_cache
2025-12-04T09:59:13.2417155Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2417279Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2417386Z configfile: pytest.ini
2025-12-04T09:59:13.2417928Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2418142Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2418939Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2419051Z Running 1 items in this shard
2025-12-04T09:59:13.2419056Z 
2025-12-04T09:59:13.2420117Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:33:42.024000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46266
2025-12-04T09:59:13.2420632Z I1204 09:33:42.024000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46267
2025-12-04T09:59:13.2421348Z I1204 09:33:42.025000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46268
2025-12-04T09:59:13.2421848Z I1204 09:33:42.026000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46269
2025-12-04T09:59:13.2424306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2424422Z   _warn_cpu_init()
2025-12-04T09:59:13.2426429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2426536Z   _warn_cpu_init()
2025-12-04T09:59:13.2428581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2428686Z   _warn_cpu_init()
2025-12-04T09:59:13.2430702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2430840Z   _warn_cpu_init()
2025-12-04T09:59:13.2431846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2431957Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2432588Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2433194Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2434078Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2434535Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2435417Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2435782Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2436634Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2437071Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2437922Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2438357Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2439254Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2439654Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2440512Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2440945Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2442481Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.2442810Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2443393Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2444432Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2444779Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2445425Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2445936Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2446341Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2446812Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2447696Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2448154Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2449029Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2449392Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2450241Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2450685Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2451543Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2452007Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2452869Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2453263Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2454129Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2454591Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2456096Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.2456495Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2457303Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2458524Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2458892Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2459647Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2460187Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2460651Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2461183Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2462188Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2462703Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2463692Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2464097Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2465060Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2465560Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2466614Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2467101Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2468071Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2468516Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2469671Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2470110Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2471620Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2471941Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2472550Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2473602Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2473950Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2474595Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2475078Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2475487Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2475961Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2476849Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2477311Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2478188Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2478545Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2479423Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2479871Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2480722Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2481157Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2482011Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2482431Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2483294Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2483731Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2485246Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.2485599Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2486193Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2487268Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2487590Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2488234Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2488716Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2488815Z dist init r=0, world=4
2025-12-04T09:59:13.2488903Z dist init r=2, world=4
2025-12-04T09:59:13.2488992Z dist init r=1, world=4
2025-12-04T09:59:13.2489088Z dist init r=3, world=4
2025-12-04T09:59:13.2490116Z [rank0]:[W1204 09:34:09.443226628 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2490205Z FAILED [28.9137s] [100%]
2025-12-04T09:59:13.2490210Z 
2025-12-04T09:59:13.2490347Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2490636Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.2490752Z Traceback (most recent call last):
2025-12-04T09:59:13.2491239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2491336Z     self._join_processes(fn)
2025-12-04T09:59:13.2491889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2492020Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2492558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2492669Z     raise RuntimeError(error)
2025-12-04T09:59:13.2492877Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2492990Z Traceback (most recent call last):
2025-12-04T09:59:13.2493472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2493571Z     getattr(self, test_name)()
2025-12-04T09:59:13.2494079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2494161Z     fn()
2025-12-04T09:59:13.2494612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2494714Z     method(*args, **kwargs)
2025-12-04T09:59:13.2495158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2495255Z     method(*args, **kwargs)
2025-12-04T09:59:13.2495700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2495790Z     with policy():
2025-12-04T09:59:13.2496248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2496446Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2497841Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2497885Z 
2025-12-04T09:59:13.2498098Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2498806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2498818Z 
2025-12-04T09:59:13.2514014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2514032Z 
2025-12-04T09:59:13.2514036Z 
2025-12-04T09:59:13.2514335Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2514592Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2515530Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml -
2025-12-04T09:59:13.2515710Z =========================== short test summary info ============================
2025-12-04T09:59:13.2516534Z FAILED [28.9137s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2516705Z Traceback (most recent call last):
2025-12-04T09:59:13.2517239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2517352Z     getattr(self, test_name)()
2025-12-04T09:59:13.2517879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2517968Z     fn()
2025-12-04T09:59:13.2518462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2518686Z     method(*args, **kwargs)
2025-12-04T09:59:13.2519190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2519303Z     method(*args, **kwargs)
2025-12-04T09:59:13.2519818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2519913Z     with policy():
2025-12-04T09:59:13.2520399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2520504Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2522663Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.2522674Z 
2025-12-04T09:59:13.2522900Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2523630Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2523636Z 
2025-12-04T09:59:13.2523905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2524086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2524274Z ====================== 1 failed, 26 deselected in 29.13s =======================
2025-12-04T09:59:13.2524376Z Got exit code 1
2025-12-04T09:59:13.2525064Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.2525481Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.2526102Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml
2025-12-04T09:59:13.2526322Z ============================= test session starts ==============================
2025-12-04T09:59:13.2526677Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2526785Z cachedir: .pytest_cache
2025-12-04T09:59:13.2527308Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2527435Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2527549Z configfile: pytest.ini
2025-12-04T09:59:13.2528087Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2528302Z collecting ... collected 60 items / 5 deselected / 55 selected
2025-12-04T09:59:13.2528452Z stepcurrent: skipping 5 already run items.
2025-12-04T09:59:13.2528569Z Running 22 items in this shard
2025-12-04T09:59:13.2528575Z 
2025-12-04T09:59:13.2529618Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:34:15.614000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46603
2025-12-04T09:59:13.2530117Z I1204 09:34:15.615000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46604
2025-12-04T09:59:13.2530612Z I1204 09:34:15.615000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46605
2025-12-04T09:59:13.2531116Z I1204 09:34:15.616000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46606
2025-12-04T09:59:13.2533237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2533355Z   _warn_cpu_init()
2025-12-04T09:59:13.2535621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2535794Z   _warn_cpu_init()
2025-12-04T09:59:13.2542293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2542410Z   _warn_cpu_init()
2025-12-04T09:59:13.2544432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2544565Z   _warn_cpu_init()
2025-12-04T09:59:13.2545583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2545732Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2546195Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2546741Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2547742Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2548368Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2549339Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2549730Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2550667Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2551135Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2552104Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2552579Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2553516Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2553947Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2554889Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2555396Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2556996Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.2557361Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2557999Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2559120Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2559473Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2560253Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2560792Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2561359Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2561906Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2562857Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2563341Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2564266Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2564647Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2565559Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2566017Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2566971Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2567430Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2568331Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2568749Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2569698Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2570163Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2571704Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.2572054Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2572721Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2573779Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2574151Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2574823Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2575345Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2575773Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2576346Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2577509Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2578029Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2579017Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2579419Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2580432Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2580924Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2581894Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2582379Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2583343Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2583818Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2584786Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2585287Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2586940Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2587351Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2588018Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2589263Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2589590Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2590226Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2590718Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2591118Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2591605Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2592493Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2592959Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2593841Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2594198Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2595090Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2595524Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2596391Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2596828Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2597720Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2598116Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2598972Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2599423Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2600890Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2601280Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2601863Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2602868Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2603190Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2603832Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2604329Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2604424Z dist init r=0, world=4
2025-12-04T09:59:13.2604525Z dist init r=1, world=4
2025-12-04T09:59:13.2604619Z dist init r=3, world=4
2025-12-04T09:59:13.2604706Z dist init r=2, world=4
2025-12-04T09:59:13.2605746Z [rank0]:[W1204 09:34:46.187451248 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2605838Z FAILED [32.7218s] [  4%]
2025-12-04T09:59:13.2605844Z 
2025-12-04T09:59:13.2605987Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2606259Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____
2025-12-04T09:59:13.2606370Z Traceback (most recent call last):
2025-12-04T09:59:13.2606900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2607004Z     self._join_processes(fn)
2025-12-04T09:59:13.2607524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2607659Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2608192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2608301Z     raise RuntimeError(error)
2025-12-04T09:59:13.2608508Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2608617Z Traceback (most recent call last):
2025-12-04T09:59:13.2609131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2609232Z     getattr(self, test_name)()
2025-12-04T09:59:13.2609705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2609796Z     fn()
2025-12-04T09:59:13.2610244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2610346Z     method(*args, **kwargs)
2025-12-04T09:59:13.2610800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2610893Z     method(*args, **kwargs)
2025-12-04T09:59:13.2611350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2611462Z     with policy():
2025-12-04T09:59:13.2611913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2612021Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2613086Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.2613121Z 
2025-12-04T09:59:13.2613328Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2613921Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2613926Z 
2025-12-04T09:59:13.2614173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2614181Z 
2025-12-04T09:59:13.2614324Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2614430Z Traceback (most recent call last):
2025-12-04T09:59:13.2614930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2615031Z     getattr(self, test_name)()
2025-12-04T09:59:13.2615516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2615593Z     fn()
2025-12-04T09:59:13.2616040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2616144Z     method(*args, **kwargs)
2025-12-04T09:59:13.2616836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2616952Z     method(*args, **kwargs)
2025-12-04T09:59:13.2617464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2617561Z     with policy():
2025-12-04T09:59:13.2618117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2618232Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2619424Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2619430Z 
2025-12-04T09:59:13.2619654Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2620319Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2620327Z 
2025-12-04T09:59:13.2620603Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2620609Z 
2025-12-04T09:59:13.2620655Z 
2025-12-04T09:59:13.2621125Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2621399Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2622215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml -
2025-12-04T09:59:13.2622389Z =========================== short test summary info ============================
2025-12-04T09:59:13.2623232Z FAILED [32.7218s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2623356Z Traceback (most recent call last):
2025-12-04T09:59:13.2623978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2624103Z     getattr(self, test_name)()
2025-12-04T09:59:13.2624646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2624789Z     fn()
2025-12-04T09:59:13.2625300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2625404Z     method(*args, **kwargs)
2025-12-04T09:59:13.2625912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2626014Z     method(*args, **kwargs)
2025-12-04T09:59:13.2626520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2626630Z     with policy():
2025-12-04T09:59:13.2627139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2627260Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2628459Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.2628467Z 
2025-12-04T09:59:13.2628694Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2629360Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2629365Z 
2025-12-04T09:59:13.2629630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2629638Z 
2025-12-04T09:59:13.2629819Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2629940Z Traceback (most recent call last):
2025-12-04T09:59:13.2630497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2630647Z     getattr(self, test_name)()
2025-12-04T09:59:13.2631188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2631294Z     fn()
2025-12-04T09:59:13.2631800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2631905Z     method(*args, **kwargs)
2025-12-04T09:59:13.2632428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2632534Z     method(*args, **kwargs)
2025-12-04T09:59:13.2633145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2633244Z     with policy():
2025-12-04T09:59:13.2633923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2634053Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2635205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2635210Z 
2025-12-04T09:59:13.2635435Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2636083Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2636118Z 
2025-12-04T09:59:13.2636377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2636567Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2636745Z ======================= 1 failed, 5 deselected in 32.94s =======================
2025-12-04T09:59:13.2636880Z Got exit code 1
2025-12-04T09:59:13.2636985Z Retrying single test...
2025-12-04T09:59:13.2637587Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml
2025-12-04T09:59:13.2637758Z ============================= test session starts ==============================
2025-12-04T09:59:13.2638097Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2638201Z cachedir: .pytest_cache
2025-12-04T09:59:13.2638715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2638833Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2638953Z configfile: pytest.ini
2025-12-04T09:59:13.2639473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2639680Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2640420Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2640528Z Running 1 items in this shard
2025-12-04T09:59:13.2640533Z 
2025-12-04T09:59:13.2641530Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:34:52.924000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46940
2025-12-04T09:59:13.2642013Z I1204 09:34:52.925000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46941
2025-12-04T09:59:13.2642496Z I1204 09:34:52.925000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46942
2025-12-04T09:59:13.2643003Z I1204 09:34:52.926000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46943
2025-12-04T09:59:13.2644978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2645086Z   _warn_cpu_init()
2025-12-04T09:59:13.2647143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2647253Z   _warn_cpu_init()
2025-12-04T09:59:13.2649144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2649249Z   _warn_cpu_init()
2025-12-04T09:59:13.2651171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2651299Z   _warn_cpu_init()
2025-12-04T09:59:13.2652234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2652339Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2652953Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2653479Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2654459Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2654946Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2655906Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2656398Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2657566Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2658109Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2659070Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2659563Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2660528Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2660974Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2661978Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2662470Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2664130Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.2664524Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2665197Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2666318Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2666716Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2667440Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2667983Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2668655Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2669274Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2670174Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2670628Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2671503Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2671866Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2672768Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2673211Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2674062Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2674501Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2675349Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2675773Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2676645Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2677079Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2678545Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.2678900Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2679489Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2680505Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2680827Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2681468Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2681960Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2682372Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2682844Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2683737Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2684186Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2685064Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2685447Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2686300Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2686737Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2687595Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2688034Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2688904Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2689302Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2690162Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2690596Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2692125Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.2692479Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2693068Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2694053Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2694378Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2695022Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2695502Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2695915Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2696462Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2697610Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2698132Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2699156Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2699566Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2700525Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2701023Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2702009Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2702498Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2703472Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2703914Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2704885Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2705403Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2707061Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2707453Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2708126Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2709424Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2709755Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2710394Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2710879Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2710980Z dist init r=1, world=4
2025-12-04T09:59:13.2711070Z dist init r=0, world=4
2025-12-04T09:59:13.2711156Z dist init r=2, world=4
2025-12-04T09:59:13.2711250Z dist init r=3, world=4
2025-12-04T09:59:13.2712275Z [rank0]:[W1204 09:35:24.707597608 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2712374Z FAILED [32.3778s] [100%]
2025-12-04T09:59:13.2712381Z 
2025-12-04T09:59:13.2712542Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2712817Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____
2025-12-04T09:59:13.2712928Z Traceback (most recent call last):
2025-12-04T09:59:13.2713412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2713509Z     self._join_processes(fn)
2025-12-04T09:59:13.2714037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2714158Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2714707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2714805Z     raise RuntimeError(error)
2025-12-04T09:59:13.2715035Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2715156Z Traceback (most recent call last):
2025-12-04T09:59:13.2715635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2715734Z     getattr(self, test_name)()
2025-12-04T09:59:13.2716211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2716290Z     fn()
2025-12-04T09:59:13.2716743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2716835Z     method(*args, **kwargs)
2025-12-04T09:59:13.2717312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2717410Z     method(*args, **kwargs)
2025-12-04T09:59:13.2717860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2717984Z     with policy():
2025-12-04T09:59:13.2718441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2718539Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2719602Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.2719608Z 
2025-12-04T09:59:13.2719801Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2720404Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2720409Z 
2025-12-04T09:59:13.2720643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2720650Z 
2025-12-04T09:59:13.2720656Z 
2025-12-04T09:59:13.2720998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2721430Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2722229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml -
2025-12-04T09:59:13.2722408Z =========================== short test summary info ============================
2025-12-04T09:59:13.2723236Z FAILED [32.3778s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.2723362Z Traceback (most recent call last):
2025-12-04T09:59:13.2723928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2724107Z     getattr(self, test_name)()
2025-12-04T09:59:13.2724653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2724743Z     fn()
2025-12-04T09:59:13.2725253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2725368Z     method(*args, **kwargs)
2025-12-04T09:59:13.2725880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2725985Z     method(*args, **kwargs)
2025-12-04T09:59:13.2726508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2726604Z     with policy():
2025-12-04T09:59:13.2727168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2727279Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2728477Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.2728483Z 
2025-12-04T09:59:13.2728706Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2729366Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2729406Z 
2025-12-04T09:59:13.2729680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2729858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2730039Z ====================== 1 failed, 26 deselected in 32.60s =======================
2025-12-04T09:59:13.2730179Z Got exit code 1
2025-12-04T09:59:13.2730283Z Retrying single test...
2025-12-04T09:59:13.2730909Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml
2025-12-04T09:59:13.2731071Z ============================= test session starts ==============================
2025-12-04T09:59:13.2731417Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2731535Z cachedir: .pytest_cache
2025-12-04T09:59:13.2732048Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2732172Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2732280Z configfile: pytest.ini
2025-12-04T09:59:13.2732826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2733040Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2733880Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2733982Z Running 1 items in this shard
2025-12-04T09:59:13.2733987Z 
2025-12-04T09:59:13.2734891Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:35:30.103000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47277
2025-12-04T09:59:13.2735338Z I1204 09:35:30.104000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47278
2025-12-04T09:59:13.2735780Z I1204 09:35:30.105000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47279
2025-12-04T09:59:13.2736252Z I1204 09:35:30.106000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47280
2025-12-04T09:59:13.2738491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2738599Z   _warn_cpu_init()
2025-12-04T09:59:13.2740637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2740748Z   _warn_cpu_init()
2025-12-04T09:59:13.2742757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2742894Z   _warn_cpu_init()
2025-12-04T09:59:13.2744913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2745040Z   _warn_cpu_init()
2025-12-04T09:59:13.2746043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2746158Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2746626Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2747168Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2748175Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2748808Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2749824Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2750187Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2751043Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2751510Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2752358Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2752787Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2753641Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2754040Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2754933Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2755371Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2756847Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:13.2757199Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2757790Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2758787Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2759138Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2759778Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2760265Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2760670Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2761147Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2762043Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2762496Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2763373Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2763734Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2764609Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2765049Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2765898Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2766325Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2767210Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2767609Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2768476Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2768916Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2770379Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.2770743Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2771329Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2772346Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2772669Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2773310Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2773796Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2774203Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2774673Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2775559Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2776013Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2777168Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2777607Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2778575Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2779069Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2780025Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2780514Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2781508Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2781958Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2782931Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2783421Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2785110Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.2785963Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2786624Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2787753Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2788121Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2788840Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2789438Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2789847Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2790315Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2791204Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2791658Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2792565Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2792927Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2793781Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2794221Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2795096Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2795533Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2796394Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2796788Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2797648Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2798107Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2799580Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 583991296 and is now 625934336.
2025-12-04T09:59:13.2799930Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2800514Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2801507Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2801830Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2802475Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2802959Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2803057Z dist init r=2, world=4
2025-12-04T09:59:13.2803142Z dist init r=0, world=4
2025-12-04T09:59:13.2803228Z dist init r=3, world=4
2025-12-04T09:59:13.2803321Z dist init r=1, world=4
2025-12-04T09:59:13.2804348Z [rank0]:[W1204 09:36:08.943788160 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2804440Z FAILED [39.4702s] [100%]
2025-12-04T09:59:13.2804445Z 
2025-12-04T09:59:13.2804607Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2804883Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____
2025-12-04T09:59:13.2804998Z Traceback (most recent call last):
2025-12-04T09:59:13.2805480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2805578Z     self._join_processes(fn)
2025-12-04T09:59:13.2806108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2806235Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2806781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2806883Z     raise RuntimeError(error)
2025-12-04T09:59:13.2807112Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2807236Z Traceback (most recent call last):
2025-12-04T09:59:13.2807715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2807814Z     getattr(self, test_name)()
2025-12-04T09:59:13.2808298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2808377Z     fn()
2025-12-04T09:59:13.2808833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2808923Z     method(*args, **kwargs)
2025-12-04T09:59:13.2809398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2809500Z     method(*args, **kwargs)
2025-12-04T09:59:13.2809947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2810057Z     with policy():
2025-12-04T09:59:13.2810519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2810617Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2811686Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:13.2811691Z 
2025-12-04T09:59:13.2811879Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2812473Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2812488Z 
2025-12-04T09:59:13.2812722Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2812730Z 
2025-12-04T09:59:13.2812735Z 
2025-12-04T09:59:13.2812927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2813169Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2813874Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml -
2025-12-04T09:59:13.2814035Z =========================== short test summary info ============================
2025-12-04T09:59:13.2814763Z FAILED [39.4702s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.2814869Z Traceback (most recent call last):
2025-12-04T09:59:13.2815367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2815488Z     getattr(self, test_name)()
2025-12-04T09:59:13.2815963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2816047Z     fn()
2025-12-04T09:59:13.2816575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2816853Z     method(*args, **kwargs)
2025-12-04T09:59:13.2817517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2817622Z     method(*args, **kwargs)
2025-12-04T09:59:13.2818136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2818230Z     with policy():
2025-12-04T09:59:13.2818790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2818913Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2820104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:13.2820110Z 
2025-12-04T09:59:13.2820335Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2821220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2821300Z 
2025-12-04T09:59:13.2821571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2821752Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2821931Z ====================== 1 failed, 26 deselected in 39.69s =======================
2025-12-04T09:59:13.2822079Z Got exit code 1
2025-12-04T09:59:13.2822670Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda
2025-12-04T09:59:13.2823083Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.2823698Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml
2025-12-04T09:59:13.2823859Z ============================= test session starts ==============================
2025-12-04T09:59:13.2824213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2824319Z cachedir: .pytest_cache
2025-12-04T09:59:13.2824835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2824966Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2825071Z configfile: pytest.ini
2025-12-04T09:59:13.2825617Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2825826Z collecting ... collected 60 items / 6 deselected / 54 selected
2025-12-04T09:59:13.2825965Z stepcurrent: skipping 6 already run items.
2025-12-04T09:59:13.2826086Z Running 21 items in this shard
2025-12-04T09:59:13.2826092Z 
2025-12-04T09:59:13.2827127Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:36:14.343000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47614
2025-12-04T09:59:13.2827636Z I1204 09:36:14.344000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47615
2025-12-04T09:59:13.2828165Z I1204 09:36:14.345000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47616
2025-12-04T09:59:13.2828660Z I1204 09:36:14.346000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47617
2025-12-04T09:59:13.2830692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2830793Z   _warn_cpu_init()
2025-12-04T09:59:13.2832980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2833080Z   _warn_cpu_init()
2025-12-04T09:59:13.2835040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2835161Z   _warn_cpu_init()
2025-12-04T09:59:13.2836163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2836290Z   _init_core_state(
2025-12-04T09:59:13.2837953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2838126Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2839115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2839226Z   _init_core_state(
2025-12-04T09:59:13.2840893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2841062Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2842050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2842145Z   _init_core_state(
2025-12-04T09:59:13.2843840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2844004Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2845963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2846063Z   _warn_cpu_init()
2025-12-04T09:59:13.2847079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2847177Z   _init_core_state(
2025-12-04T09:59:13.2848837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2849002Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2850652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2850847Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2852530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2852695Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2854348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2854518Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2859243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.2859653Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.2864169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.2864576Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.2869348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.2869744Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.2873729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.2874081Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.2874771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.2874873Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2875601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.2875698Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2876370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.2876480Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2877157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.2877260Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2877930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.2878025Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2878733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.2878830Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2879514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.2879609Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2880279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.2880383Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2881287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.2881397Z   return func(*args, **kwargs)
2025-12-04T09:59:13.2881804Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2882303Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2883193Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2883643Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2884537Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2884889Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2885752Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2886184Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2887032Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2887474Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2888342Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2888749Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2889605Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2890048Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2891546Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384.
2025-12-04T09:59:13.2891876Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2892467Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2893461Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2893815Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2894446Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2894962Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.2895360Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2895832Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2896978Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2897493Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2898493Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2898889Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2899857Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2900342Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2901299Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2901827Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2902791Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2903250Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2904212Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2904776Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2906445Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.2906813Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2907482Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2908613Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2909132Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2909794Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2910286Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.2910682Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2911153Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2912050Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2912497Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2913373Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2913721Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2914590Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2915022Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2915897Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2916336Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2917185Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2917774Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2918710Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2919187Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2920908Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.2921436Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2922155Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2923281Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2923717Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2924434Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2924992Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.2925442Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.2925978Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.2926991Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2927494Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.2928490Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2928884Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.2929847Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2930374Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2931344Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2931835Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.2932799Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2933288Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.2934323Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2934796Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.2936430Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.2937002Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2937687Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2938840Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2939213Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.2939923Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2940482Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.2940581Z dist init r=3, world=4
2025-12-04T09:59:13.2940682Z dist init r=0, world=4
2025-12-04T09:59:13.2940796Z dist init r=2, world=4
2025-12-04T09:59:13.2940893Z dist init r=1, world=4
2025-12-04T09:59:13.2942047Z [rank0]:[W1204 09:36:24.244897733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2943205Z [rank3]:[W1204 09:36:24.245153369 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2944345Z [rank2]:[W1204 09:36:24.248749742 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2945516Z [rank1]:[W1204 09:36:24.253210955 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.2945622Z FAILED [27.1464s] [  4%]
2025-12-04T09:59:13.2945629Z 
2025-12-04T09:59:13.2945788Z =================================== FAILURES ===================================
2025-12-04T09:59:13.2946093Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____
2025-12-04T09:59:13.2946210Z Traceback (most recent call last):
2025-12-04T09:59:13.2946767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.2946878Z     self._join_processes(fn)
2025-12-04T09:59:13.2947503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.2947645Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.2948256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.2948376Z     raise RuntimeError(error)
2025-12-04T09:59:13.2948722Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.2948843Z Traceback (most recent call last):
2025-12-04T09:59:13.2949479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2949586Z     getattr(self, test_name)()
2025-12-04T09:59:13.2950102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2950213Z     fn()
2025-12-04T09:59:13.2950694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2950802Z     method(*args, **kwargs)
2025-12-04T09:59:13.2951299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2951398Z     method(*args, **kwargs)
2025-12-04T09:59:13.2951876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2951966Z     with policy():
2025-12-04T09:59:13.2952449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2952550Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2953690Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.2953698Z 
2025-12-04T09:59:13.2953910Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2954542Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2954547Z 
2025-12-04T09:59:13.2954916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2954921Z 
2025-12-04T09:59:13.2954925Z 
2025-12-04T09:59:13.2955117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.2955355Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.2956061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml -
2025-12-04T09:59:13.2956215Z =========================== short test summary info ============================
2025-12-04T09:59:13.2956993Z FAILED [27.1464s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.2957112Z Traceback (most recent call last):
2025-12-04T09:59:13.2957609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.2957706Z     getattr(self, test_name)()
2025-12-04T09:59:13.2958183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.2958275Z     fn()
2025-12-04T09:59:13.2958726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2958823Z     method(*args, **kwargs)
2025-12-04T09:59:13.2959306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.2959406Z     method(*args, **kwargs)
2025-12-04T09:59:13.2959868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.2959952Z     with policy():
2025-12-04T09:59:13.2960401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.2960508Z     raise RuntimeError(msg)
2025-12-04T09:59:13.2961582Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.2961612Z 
2025-12-04T09:59:13.2961819Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.2962421Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2962461Z 
2025-12-04T09:59:13.2962701Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.2962877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.2963034Z ======================= 1 failed, 6 deselected in 27.36s =======================
2025-12-04T09:59:13.2963133Z Got exit code 1
2025-12-04T09:59:13.2963226Z Retrying single test...
2025-12-04T09:59:13.2963775Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml
2025-12-04T09:59:13.2963930Z ============================= test session starts ==============================
2025-12-04T09:59:13.2964244Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.2964343Z cachedir: .pytest_cache
2025-12-04T09:59:13.2964811Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.2964922Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.2965030Z configfile: pytest.ini
2025-12-04T09:59:13.2965503Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.2965694Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.2966366Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.2966465Z Running 1 items in this shard
2025-12-04T09:59:13.2966471Z 
2025-12-04T09:59:13.2967395Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:36:46.303000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 48719
2025-12-04T09:59:13.2967861Z I1204 09:36:46.304000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 48720
2025-12-04T09:59:13.2968298Z I1204 09:36:46.305000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 48721
2025-12-04T09:59:13.2968743Z I1204 09:36:46.306000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 48722
2025-12-04T09:59:13.2970564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2970671Z   _warn_cpu_init()
2025-12-04T09:59:13.2972468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2972571Z   _warn_cpu_init()
2025-12-04T09:59:13.2974353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2974511Z   _warn_cpu_init()
2025-12-04T09:59:13.2975415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2975501Z   _init_core_state(
2025-12-04T09:59:13.2976485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2976577Z   _init_core_state(
2025-12-04T09:59:13.2978458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2978628Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2980342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2980505Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2981524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2981633Z   _init_core_state(
2025-12-04T09:59:13.2983376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2983546Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2985584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.2985697Z   _warn_cpu_init()
2025-12-04T09:59:13.2986713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.2986811Z   _init_core_state(
2025-12-04T09:59:13.2988533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2988725Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2990385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2990559Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2992072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2992218Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2993729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.2993874Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.2997935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.2998284Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3002301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3002648Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3006660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3007065Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3011029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3011378Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3012086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3012196Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3012886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3012987Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3013658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3013754Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3014437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3014537Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3015235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3015343Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3016011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3016115Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3017046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3017155Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3017957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3018067Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3019080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3019219Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3019680Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3020228Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3021461Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3021992Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3022986Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3023393Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3024357Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3024842Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3025813Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3026363Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3027334Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3027778Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3028758Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3029300Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3030966Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3031346Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3032001Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3033252Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3033583Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3034267Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3034750Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3035152Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3035632Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3036525Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3036984Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3037860Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3038223Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3039076Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3039509Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3040395Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3040826Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3041685Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3042083Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3042971Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3043409Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3044876Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 718209024 and is now 10532880384.
2025-12-04T09:59:13.3045210Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3045820Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3046828Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3047179Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3047819Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3048302Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3048701Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3049181Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3050066Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3050531Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3051402Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3051759Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3052639Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3053071Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3053932Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3054359Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3055215Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3055634Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3056580Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3057237Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3058904Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328.
2025-12-04T09:59:13.3059315Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3059976Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3061137Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3061503Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3062225Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3062772Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3063230Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3063773Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3064770Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3065278Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3066269Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3066668Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3067667Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3068159Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3069205Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3069640Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3070533Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3070928Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3071787Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3072230Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3073715Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3074075Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3074686Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3075686Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3076006Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3076653Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3077135Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3077229Z dist init r=1, world=4
2025-12-04T09:59:13.3077327Z dist init r=0, world=4
2025-12-04T09:59:13.3077414Z dist init r=3, world=4
2025-12-04T09:59:13.3077499Z dist init r=2, world=4
2025-12-04T09:59:13.3078536Z [rank1]:[W1204 09:36:56.161187001 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3079552Z [rank0]:[W1204 09:36:56.164696242 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3080615Z [rank3]:[W1204 09:36:56.164914126 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3081629Z [rank2]:[W1204 09:36:56.167377245 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3081736Z FAILED [26.9482s] [100%]
2025-12-04T09:59:13.3081742Z 
2025-12-04T09:59:13.3081871Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3082144Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____
2025-12-04T09:59:13.3082270Z Traceback (most recent call last):
2025-12-04T09:59:13.3082782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3082899Z     self._join_processes(fn)
2025-12-04T09:59:13.3083419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3083545Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3084095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3084195Z     raise RuntimeError(error)
2025-12-04T09:59:13.3084404Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.3084527Z Traceback (most recent call last):
2025-12-04T09:59:13.3085007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3085141Z     getattr(self, test_name)()
2025-12-04T09:59:13.3085622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3085730Z     fn()
2025-12-04T09:59:13.3086193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3086286Z     method(*args, **kwargs)
2025-12-04T09:59:13.3086732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3086833Z     method(*args, **kwargs)
2025-12-04T09:59:13.3087282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3087374Z     with policy():
2025-12-04T09:59:13.3087826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3087922Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3089008Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3089016Z 
2025-12-04T09:59:13.3089208Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3089820Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3089826Z 
2025-12-04T09:59:13.3090059Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3090064Z 
2025-12-04T09:59:13.3090068Z 
2025-12-04T09:59:13.3090275Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3090506Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3091247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml -
2025-12-04T09:59:13.3091413Z =========================== short test summary info ============================
2025-12-04T09:59:13.3092153Z FAILED [26.9482s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.3092270Z Traceback (most recent call last):
2025-12-04T09:59:13.3092756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3092853Z     getattr(self, test_name)()
2025-12-04T09:59:13.3093339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3093418Z     fn()
2025-12-04T09:59:13.3093893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3093996Z     method(*args, **kwargs)
2025-12-04T09:59:13.3094445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3094553Z     method(*args, **kwargs)
2025-12-04T09:59:13.3094996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3095081Z     with policy():
2025-12-04T09:59:13.3095536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3095630Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3097000Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3097011Z 
2025-12-04T09:59:13.3097265Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3097936Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3097942Z 
2025-12-04T09:59:13.3098213Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3098393Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3098582Z ====================== 1 failed, 26 deselected in 27.17s =======================
2025-12-04T09:59:13.3098677Z Got exit code 1
2025-12-04T09:59:13.3098782Z Retrying single test...
2025-12-04T09:59:13.3099410Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml
2025-12-04T09:59:13.3099571Z ============================= test session starts ==============================
2025-12-04T09:59:13.3099920Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3100037Z cachedir: .pytest_cache
2025-12-04T09:59:13.3100548Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3100680Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3100784Z configfile: pytest.ini
2025-12-04T09:59:13.3101316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3101543Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.3102296Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3102416Z Running 1 items in this shard
2025-12-04T09:59:13.3102422Z 
2025-12-04T09:59:13.3103481Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:37:18.274000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 49824
2025-12-04T09:59:13.3103982Z I1204 09:37:18.275000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 49825
2025-12-04T09:59:13.3104480Z I1204 09:37:18.275000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 49826
2025-12-04T09:59:13.3104968Z I1204 09:37:18.276000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 49827
2025-12-04T09:59:13.3107042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3107144Z   _warn_cpu_init()
2025-12-04T09:59:13.3109272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3109404Z   _warn_cpu_init()
2025-12-04T09:59:13.3111378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3111498Z   _warn_cpu_init()
2025-12-04T09:59:13.3112497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.3112598Z   _init_core_state(
2025-12-04T09:59:13.3113576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.3113681Z   _init_core_state(
2025-12-04T09:59:13.3115333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3115494Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3117177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3117336Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3118593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.3118687Z   _init_core_state(
2025-12-04T09:59:13.3120227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3120370Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3122703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3122812Z   _warn_cpu_init()
2025-12-04T09:59:13.3123828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.3123937Z   _init_core_state(
2025-12-04T09:59:13.3125644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3125859Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3127625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3127794Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3129497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3129672Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3131372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3131534Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3136065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3136515Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3141242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3142018Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3146523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3146950Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3151391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3151766Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3152462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3152560Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3153244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3153344Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3154018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3154128Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3154829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3154935Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3155605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3155696Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3156371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3156464Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3157157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3157265Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3157934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3158062Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3158943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3159037Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3159457Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3159929Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3160827Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3161280Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3162166Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3162515Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3163365Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3163808Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3164680Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3165122Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3165972Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3166373Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3167258Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3167700Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3169189Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3169511Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3170126Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3171120Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3171472Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3172106Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3172591Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3172998Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3173469Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3174359Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3174805Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3175687Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3176040Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3177205Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3177705Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3178662Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3179153Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3180111Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3180603Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3181566Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3182058Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3183737Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384.
2025-12-04T09:59:13.3184146Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3184814Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3185969Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3186340Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3187051Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3187597Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3188058Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3188584Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3189734Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3190186Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3191058Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3191451Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3192305Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3192754Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3193603Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3194042Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3194917Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3195318Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3196184Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3196621Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3198102Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3198480Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3199074Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3200068Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3200398Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3201035Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3201519Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3201937Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3202412Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3203313Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3203764Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3204672Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3205037Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3205894Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3206339Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3207189Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3207666Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3208530Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3208928Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3209790Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3210250Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3211746Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328.
2025-12-04T09:59:13.3212097Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3212693Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3213881Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3214238Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3214915Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3215423Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3215527Z dist init r=2, world=4
2025-12-04T09:59:13.3215627Z dist init r=3, world=4
2025-12-04T09:59:13.3215719Z dist init r=1, world=4
2025-12-04T09:59:13.3215823Z dist init r=0, world=4
2025-12-04T09:59:13.3217171Z [rank2]:[W1204 09:37:28.992693420 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3218375Z [rank3]:[W1204 09:37:28.995444652 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3219522Z [rank1]:[W1204 09:37:28.998231921 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3220671Z [rank0]:[W1204 09:37:28.000930885 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3220987Z FAILED [27.1321s] [100%]
2025-12-04T09:59:13.3220998Z 
2025-12-04T09:59:13.3221166Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3221553Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____
2025-12-04T09:59:13.3221678Z Traceback (most recent call last):
2025-12-04T09:59:13.3222234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3222346Z     self._join_processes(fn)
2025-12-04T09:59:13.3222932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3223078Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3223685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3223842Z     raise RuntimeError(error)
2025-12-04T09:59:13.3224078Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3224195Z Traceback (most recent call last):
2025-12-04T09:59:13.3224749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3224898Z     getattr(self, test_name)()
2025-12-04T09:59:13.3225429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3225530Z     fn()
2025-12-04T09:59:13.3226039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3226150Z     method(*args, **kwargs)
2025-12-04T09:59:13.3226655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3226760Z     method(*args, **kwargs)
2025-12-04T09:59:13.3227269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3227365Z     with policy():
2025-12-04T09:59:13.3227879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3227997Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3229204Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328.
2025-12-04T09:59:13.3229210Z 
2025-12-04T09:59:13.3229439Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3230113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3230121Z 
2025-12-04T09:59:13.3230390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3230396Z 
2025-12-04T09:59:13.3230402Z 
2025-12-04T09:59:13.3230661Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3230929Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3231746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml -
2025-12-04T09:59:13.3231913Z =========================== short test summary info ============================
2025-12-04T09:59:13.3232867Z FAILED [27.1321s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3233095Z Traceback (most recent call last):
2025-12-04T09:59:13.3233587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3233739Z     getattr(self, test_name)()
2025-12-04T09:59:13.3234222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3234305Z     fn()
2025-12-04T09:59:13.3234762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3234855Z     method(*args, **kwargs)
2025-12-04T09:59:13.3235307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3235398Z     method(*args, **kwargs)
2025-12-04T09:59:13.3235845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3235969Z     with policy():
2025-12-04T09:59:13.3236603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3236721Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3237859Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328.
2025-12-04T09:59:13.3237895Z 
2025-12-04T09:59:13.3238099Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3238746Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3238751Z 
2025-12-04T09:59:13.3239000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3239184Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3239353Z ====================== 1 failed, 26 deselected in 27.35s =======================
2025-12-04T09:59:13.3239448Z Got exit code 1
2025-12-04T09:59:13.3240012Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda
2025-12-04T09:59:13.3240398Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.3240987Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml
2025-12-04T09:59:13.3241142Z ============================= test session starts ==============================
2025-12-04T09:59:13.3241468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3241576Z cachedir: .pytest_cache
2025-12-04T09:59:13.3242063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3242177Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3242288Z configfile: pytest.ini
2025-12-04T09:59:13.3242821Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3243033Z collecting ... collected 60 items / 7 deselected / 53 selected
2025-12-04T09:59:13.3243167Z stepcurrent: skipping 7 already run items.
2025-12-04T09:59:13.3243270Z Running 20 items in this shard
2025-12-04T09:59:13.3243275Z 
2025-12-04T09:59:13.3244292Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:37:50.204000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 50929
2025-12-04T09:59:13.3244760Z I1204 09:37:50.205000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 50930
2025-12-04T09:59:13.3245238Z I1204 09:37:50.205000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 50931
2025-12-04T09:59:13.3245730Z I1204 09:37:50.206000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 50932
2025-12-04T09:59:13.3247660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3247764Z   _warn_cpu_init()
2025-12-04T09:59:13.3249720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3249876Z   _warn_cpu_init()
2025-12-04T09:59:13.3251661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3251761Z   _warn_cpu_init()
2025-12-04T09:59:13.3252683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3252779Z   _init_core_state(
2025-12-04T09:59:13.3253695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3253786Z   _init_core_state(
2025-12-04T09:59:13.3254710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3254799Z   _init_core_state(
2025-12-04T09:59:13.3256411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3256602Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3258482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3258653Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3260395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3260579Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3262592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3262701Z   _warn_cpu_init()
2025-12-04T09:59:13.3263734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3263866Z   _init_core_state(
2025-12-04T09:59:13.3265583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3265781Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3267488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3267653Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3269550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3269700Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3271219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3271366Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3275441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3275801Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3279830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3280241Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3284234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3284595Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3288634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3288982Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3289681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3289782Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3290487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3290598Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3291281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3291397Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3292072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3292170Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3292849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3292975Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3293647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3293753Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3294452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3294559Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3295236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3295329Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3296223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3296385Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3296997Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3297536Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3298538Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3299060Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3300043Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3300454Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3301456Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3301956Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3302919Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3303403Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3304408Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3304856Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3305831Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3306325Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3308040Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3308442Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3309235Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3310269Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3310592Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3311244Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3311732Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3312145Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3312617Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3313507Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3313966Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3314873Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3315239Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3316100Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3316548Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3317402Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3317867Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3318729Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3319124Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3319992Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3320455Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3322411Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384.
2025-12-04T09:59:13.3322847Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3323520Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3324680Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3325046Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3325776Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3326329Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3326795Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3327328Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3328338Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3328900Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3329893Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3330303Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3331264Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3331766Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3332762Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3333255Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3334259Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3334659Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3335518Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3335998Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3337968Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3338339Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3339012Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3340192Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3340557Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3341279Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3341823Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3342282Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3342813Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3343848Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3344363Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3345352Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3345757Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3346721Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3347247Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3348212Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3348806Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3349812Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3350240Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3351108Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3351569Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3353078Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3353404Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3353998Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3355032Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3355358Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3355998Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3356479Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3356582Z dist init r=0, world=4
2025-12-04T09:59:13.3356669Z dist init r=2, world=4
2025-12-04T09:59:13.3356757Z dist init r=3, world=4
2025-12-04T09:59:13.3356855Z dist init r=1, world=4
2025-12-04T09:59:13.3357903Z [rank0]:[W1204 09:38:00.997136005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3358927Z [rank1]:[W1204 09:38:00.998519129 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3359937Z [rank2]:[W1204 09:38:00.998744042 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3360975Z [rank3]:[W1204 09:38:00.999515853 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3361077Z FAILED [27.4146s] [  5%]
2025-12-04T09:59:13.3361082Z 
2025-12-04T09:59:13.3361216Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3361516Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.3361624Z Traceback (most recent call last):
2025-12-04T09:59:13.3362113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3362247Z     self._join_processes(fn)
2025-12-04T09:59:13.3362768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3362902Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3363440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3363571Z     raise RuntimeError(error)
2025-12-04T09:59:13.3363791Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3363898Z Traceback (most recent call last):
2025-12-04T09:59:13.3364375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3364484Z     getattr(self, test_name)()
2025-12-04T09:59:13.3364960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3365053Z     fn()
2025-12-04T09:59:13.3365503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3365597Z     method(*args, **kwargs)
2025-12-04T09:59:13.3366056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3366152Z     method(*args, **kwargs)
2025-12-04T09:59:13.3366608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3366695Z     with policy():
2025-12-04T09:59:13.3367146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3367254Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3368369Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3368378Z 
2025-12-04T09:59:13.3368581Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3369238Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3369244Z 
2025-12-04T09:59:13.3369479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3369484Z 
2025-12-04T09:59:13.3369637Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3369744Z Traceback (most recent call last):
2025-12-04T09:59:13.3370239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3370339Z     getattr(self, test_name)()
2025-12-04T09:59:13.3370814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3370905Z     fn()
2025-12-04T09:59:13.3371380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3371477Z     method(*args, **kwargs)
2025-12-04T09:59:13.3371934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3372028Z     method(*args, **kwargs)
2025-12-04T09:59:13.3372483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3372568Z     with policy():
2025-12-04T09:59:13.3373020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3373155Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3374272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3374303Z 
2025-12-04T09:59:13.3374503Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3375133Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3375138Z 
2025-12-04T09:59:13.3375375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3375380Z 
2025-12-04T09:59:13.3375393Z 
2025-12-04T09:59:13.3375590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3375825Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3376630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml -
2025-12-04T09:59:13.3376975Z =========================== short test summary info ============================
2025-12-04T09:59:13.3377851Z FAILED [27.4146s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3377980Z Traceback (most recent call last):
2025-12-04T09:59:13.3378533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3378651Z     getattr(self, test_name)()
2025-12-04T09:59:13.3379187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3379280Z     fn()
2025-12-04T09:59:13.3379795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3379903Z     method(*args, **kwargs)
2025-12-04T09:59:13.3380451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3380559Z     method(*args, **kwargs)
2025-12-04T09:59:13.3381061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3381165Z     with policy():
2025-12-04T09:59:13.3381669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3381778Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3383039Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3383047Z 
2025-12-04T09:59:13.3383291Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3384015Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3384022Z 
2025-12-04T09:59:13.3384286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3384291Z 
2025-12-04T09:59:13.3384463Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3384584Z Traceback (most recent call last):
2025-12-04T09:59:13.3385130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3385253Z     getattr(self, test_name)()
2025-12-04T09:59:13.3385835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3385923Z     fn()
2025-12-04T09:59:13.3386439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3386571Z     method(*args, **kwargs)
2025-12-04T09:59:13.3387086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3387187Z     method(*args, **kwargs)
2025-12-04T09:59:13.3387689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3387797Z     with policy():
2025-12-04T09:59:13.3388304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3388414Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3389804Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3389812Z 
2025-12-04T09:59:13.3390003Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3390644Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3390648Z 
2025-12-04T09:59:13.3390881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3391049Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3391205Z ======================= 1 failed, 7 deselected in 27.63s =======================
2025-12-04T09:59:13.3391294Z Got exit code 1
2025-12-04T09:59:13.3391398Z Retrying single test...
2025-12-04T09:59:13.3391951Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml
2025-12-04T09:59:13.3392123Z ============================= test session starts ==============================
2025-12-04T09:59:13.3392445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3392540Z cachedir: .pytest_cache
2025-12-04T09:59:13.3393005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3393113Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3393207Z configfile: pytest.ini
2025-12-04T09:59:13.3393687Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3393882Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.3394610Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3394726Z Running 1 items in this shard
2025-12-04T09:59:13.3394733Z 
2025-12-04T09:59:13.3395683Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:38:22.164000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52034
2025-12-04T09:59:13.3396137Z I1204 09:38:22.165000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52035
2025-12-04T09:59:13.3396576Z I1204 09:38:22.165000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52036
2025-12-04T09:59:13.3397019Z I1204 09:38:22.166000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52037
2025-12-04T09:59:13.3398856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3398982Z   _warn_cpu_init()
2025-12-04T09:59:13.3400757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3400848Z   _warn_cpu_init()
2025-12-04T09:59:13.3402634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3402724Z   _warn_cpu_init()
2025-12-04T09:59:13.3403661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3403746Z   _init_core_state(
2025-12-04T09:59:13.3404679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3404769Z   _init_core_state(
2025-12-04T09:59:13.3405704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3405806Z   _init_core_state(
2025-12-04T09:59:13.3407325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3407484Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3409029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3409184Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3410690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3410872Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3412666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3412783Z   _warn_cpu_init()
2025-12-04T09:59:13.3413705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3413794Z   _init_core_state(
2025-12-04T09:59:13.3415312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3415466Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3417279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3417443Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3419194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3419362Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3421297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3421475Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3426048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3426459Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3431013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3431441Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3435846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3436268Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3440657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3441018Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3441709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3441824Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3442498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3442593Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3443303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3443402Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3444085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3444206Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3444875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3444977Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3445644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3445743Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3446419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3446516Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3447201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3447299Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3448184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3448287Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3448698Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3449179Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3450099Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3450561Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3451440Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3451792Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3452678Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3453114Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3453970Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3454404Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3455251Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3455684Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3456609Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3457311Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3459023Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384.
2025-12-04T09:59:13.3459398Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3460060Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3461238Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3461602Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3462321Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3462870Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3463328Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3463897Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3464907Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3465415Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3466412Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3466809Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3467810Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3468303Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3469347Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3469782Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3470667Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3471072Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3471953Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3472397Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3473906Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3474240Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3474823Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3475855Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3476175Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3476815Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3477329Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3477732Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3478207Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3479094Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3479546Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3480452Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3480803Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3481660Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3482091Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3482947Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3483408Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3484259Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3484687Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3485541Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3485986Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3487497Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3487828Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3488412Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3493840Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3494245Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3494987Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3495483Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3495887Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3496475Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3497630Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3498200Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3499192Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3499600Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3500565Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3501049Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3502055Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3502579Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3503542Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3503988Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3504953Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3505441Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3507154Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328.
2025-12-04T09:59:13.3507520Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3508179Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3509515Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3509870Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3510510Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3510987Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3511077Z dist init r=2, world=4
2025-12-04T09:59:13.3511169Z dist init r=0, world=4
2025-12-04T09:59:13.3511252Z dist init r=1, world=4
2025-12-04T09:59:13.3511340Z dist init r=3, world=4
2025-12-04T09:59:13.3512392Z [rank0]:[W1204 09:38:32.916397137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3513403Z [rank2]:[W1204 09:38:32.918946814 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3514411Z [rank1]:[W1204 09:38:32.919209159 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3515412Z [rank3]:[W1204 09:38:32.928217137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3515532Z FAILED [27.2325s] [100%]
2025-12-04T09:59:13.3515541Z 
2025-12-04T09:59:13.3515671Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3515989Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.3516092Z Traceback (most recent call last):
2025-12-04T09:59:13.3516576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3516680Z     self._join_processes(fn)
2025-12-04T09:59:13.3517195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3517316Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3517859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3517957Z     raise RuntimeError(error)
2025-12-04T09:59:13.3518170Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.3518277Z Traceback (most recent call last):
2025-12-04T09:59:13.3518756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3518857Z     getattr(self, test_name)()
2025-12-04T09:59:13.3519324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3519401Z     fn()
2025-12-04T09:59:13.3519862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3519952Z     method(*args, **kwargs)
2025-12-04T09:59:13.3520406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3520496Z     method(*args, **kwargs)
2025-12-04T09:59:13.3521302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3521420Z     with policy():
2025-12-04T09:59:13.3521931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3522045Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3523292Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384.
2025-12-04T09:59:13.3523299Z 
2025-12-04T09:59:13.3523510Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3524236Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3524302Z 
2025-12-04T09:59:13.3524570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3524578Z 
2025-12-04T09:59:13.3524583Z 
2025-12-04T09:59:13.3524810Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3525071Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3525881Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml -
2025-12-04T09:59:13.3526057Z =========================== short test summary info ============================
2025-12-04T09:59:13.3526929Z FAILED [27.2325s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.3527103Z Traceback (most recent call last):
2025-12-04T09:59:13.3527655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3527804Z     getattr(self, test_name)()
2025-12-04T09:59:13.3528344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3528433Z     fn()
2025-12-04T09:59:13.3528946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3529050Z     method(*args, **kwargs)
2025-12-04T09:59:13.3529552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3529660Z     method(*args, **kwargs)
2025-12-04T09:59:13.3530162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3530256Z     with policy():
2025-12-04T09:59:13.3530772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3530883Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3532144Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384.
2025-12-04T09:59:13.3532150Z 
2025-12-04T09:59:13.3532364Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3533076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3533084Z 
2025-12-04T09:59:13.3533476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3533762Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3534087Z ====================== 1 failed, 26 deselected in 27.45s =======================
2025-12-04T09:59:13.3534177Z Got exit code 1
2025-12-04T09:59:13.3534270Z Retrying single test...
2025-12-04T09:59:13.3534825Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml
2025-12-04T09:59:13.3534967Z ============================= test session starts ==============================
2025-12-04T09:59:13.3535284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3535376Z cachedir: .pytest_cache
2025-12-04T09:59:13.3535830Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3535949Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3536040Z configfile: pytest.ini
2025-12-04T09:59:13.3536624Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3537022Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.3537816Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3537934Z Running 1 items in this shard
2025-12-04T09:59:13.3537940Z 
2025-12-04T09:59:13.3538998Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:38:54.044000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 53139
2025-12-04T09:59:13.3539538Z I1204 09:38:54.045000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 53140
2025-12-04T09:59:13.3540032Z I1204 09:38:54.046000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 53141
2025-12-04T09:59:13.3540521Z I1204 09:38:54.046000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 53142
2025-12-04T09:59:13.3542598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3542693Z   _warn_cpu_init()
2025-12-04T09:59:13.3544728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3544828Z   _warn_cpu_init()
2025-12-04T09:59:13.3546840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3546940Z   _warn_cpu_init()
2025-12-04T09:59:13.3548016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3548117Z   _init_core_state(
2025-12-04T09:59:13.3549330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3549425Z   _init_core_state(
2025-12-04T09:59:13.3550344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3550439Z   _init_core_state(
2025-12-04T09:59:13.3551991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3552144Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3553671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3553812Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3555401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3555572Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3557373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3557462Z   _warn_cpu_init()
2025-12-04T09:59:13.3558397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.3558482Z   _init_core_state(
2025-12-04T09:59:13.3559987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3560139Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3561653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3561809Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3563340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3563493Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3565022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3565175Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3569537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3569954Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3574378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3574770Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3579527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3579921Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3584480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.3584874Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.3585648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3585787Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3586563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3586702Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3587466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3587580Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3588447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3588563Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3589296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3589399Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3590252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3590355Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3591070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3591170Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3591877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3591985Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3592922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3593030Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3593493Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3593994Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3594943Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3595419Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3596393Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3596768Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3597676Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3598300Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3599239Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3599763Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3600696Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3601162Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3602092Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3602572Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3604320Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384.
2025-12-04T09:59:13.3604664Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3605286Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3606376Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3606727Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3607436Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3607959Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3608384Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3608878Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3609825Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3610403Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3611317Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3611672Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3612532Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3612963Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3613843Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3614282Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3615160Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3615559Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3616501Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3617145Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3618862Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328.
2025-12-04T09:59:13.3619226Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3619885Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3621281Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3621729Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3622445Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3622988Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3623442Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3623971Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3625015Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3625524Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3626519Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3626913Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3627871Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3628401Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3629363Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3629897Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3630856Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3631304Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3632272Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3632973Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3634497Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3634816Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3635407Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3636461Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3636793Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3637426Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3637907Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3638312Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3638806Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3639704Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3640150Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3641035Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3641386Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3642263Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3642704Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3643587Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3644024Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3644868Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3645274Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3646128Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3646560Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3648066Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3648390Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3649011Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3650037Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3650368Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3650996Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3651480Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3651613Z dist init r=1, world=4
2025-12-04T09:59:13.3651702Z dist init r=0, world=4
2025-12-04T09:59:13.3651799Z dist init r=2, world=4
2025-12-04T09:59:13.3651884Z dist init r=3, world=4
2025-12-04T09:59:13.3652909Z [rank1]:[W1204 09:39:04.811372721 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3653926Z [rank0]:[W1204 09:39:04.815044768 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3654962Z [rank2]:[W1204 09:39:04.817517465 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3655979Z [rank3]:[W1204 09:39:04.821384071 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3656097Z FAILED [27.6191s] [100%]
2025-12-04T09:59:13.3656102Z 
2025-12-04T09:59:13.3656242Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3656605Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _
2025-12-04T09:59:13.3656886Z Traceback (most recent call last):
2025-12-04T09:59:13.3657443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3657556Z     self._join_processes(fn)
2025-12-04T09:59:13.3658150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3658299Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3658909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3659029Z     raise RuntimeError(error)
2025-12-04T09:59:13.3659262Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.3659378Z Traceback (most recent call last):
2025-12-04T09:59:13.3659923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3660033Z     getattr(self, test_name)()
2025-12-04T09:59:13.3660563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3660658Z     fn()
2025-12-04T09:59:13.3661165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3661312Z     method(*args, **kwargs)
2025-12-04T09:59:13.3661818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3661916Z     method(*args, **kwargs)
2025-12-04T09:59:13.3662425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3662519Z     with policy():
2025-12-04T09:59:13.3663024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3663143Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3664416Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384.
2025-12-04T09:59:13.3664424Z 
2025-12-04T09:59:13.3664651Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3665364Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3665370Z 
2025-12-04T09:59:13.3665645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3665650Z 
2025-12-04T09:59:13.3665811Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3665931Z Traceback (most recent call last):
2025-12-04T09:59:13.3666480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3666616Z     getattr(self, test_name)()
2025-12-04T09:59:13.3667153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3667250Z     fn()
2025-12-04T09:59:13.3667758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3667896Z     method(*args, **kwargs)
2025-12-04T09:59:13.3668399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3668503Z     method(*args, **kwargs)
2025-12-04T09:59:13.3669107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3669194Z     with policy():
2025-12-04T09:59:13.3669648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3669748Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3670857Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3670864Z 
2025-12-04T09:59:13.3671056Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3671686Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3671691Z 
2025-12-04T09:59:13.3671932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3671936Z 
2025-12-04T09:59:13.3672079Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3672190Z Traceback (most recent call last):
2025-12-04T09:59:13.3672681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3672778Z     getattr(self, test_name)()
2025-12-04T09:59:13.3673288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3673366Z     fn()
2025-12-04T09:59:13.3673812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3673908Z     method(*args, **kwargs)
2025-12-04T09:59:13.3674353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3674442Z     method(*args, **kwargs)
2025-12-04T09:59:13.3674890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3674973Z     with policy():
2025-12-04T09:59:13.3675426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3675546Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3676647Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3676654Z 
2025-12-04T09:59:13.3676848Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3677474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3677479Z 
2025-12-04T09:59:13.3677718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3677754Z 
2025-12-04T09:59:13.3677758Z 
2025-12-04T09:59:13.3677950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3678192Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3678896Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml -
2025-12-04T09:59:13.3679070Z =========================== short test summary info ============================
2025-12-04T09:59:13.3679850Z FAILED [27.6191s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.3679955Z Traceback (most recent call last):
2025-12-04T09:59:13.3680445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3680543Z     getattr(self, test_name)()
2025-12-04T09:59:13.3681013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3681098Z     fn()
2025-12-04T09:59:13.3681543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3681637Z     method(*args, **kwargs)
2025-12-04T09:59:13.3682091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3682178Z     method(*args, **kwargs)
2025-12-04T09:59:13.3682808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3682900Z     with policy():
2025-12-04T09:59:13.3683374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3683486Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3684686Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384.
2025-12-04T09:59:13.3684694Z 
2025-12-04T09:59:13.3684902Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3685568Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3685573Z 
2025-12-04T09:59:13.3685820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3685825Z 
2025-12-04T09:59:13.3685981Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3686095Z Traceback (most recent call last):
2025-12-04T09:59:13.3686617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3686746Z     getattr(self, test_name)()
2025-12-04T09:59:13.3687253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3687347Z     fn()
2025-12-04T09:59:13.3687823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3687919Z     method(*args, **kwargs)
2025-12-04T09:59:13.3688398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3688498Z     method(*args, **kwargs)
2025-12-04T09:59:13.3688976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3689095Z     with policy():
2025-12-04T09:59:13.3689567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3689678Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3690848Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328.
2025-12-04T09:59:13.3690880Z 
2025-12-04T09:59:13.3691096Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3691760Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3691765Z 
2025-12-04T09:59:13.3692011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3692026Z 
2025-12-04T09:59:13.3692176Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3692287Z Traceback (most recent call last):
2025-12-04T09:59:13.3692808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3692917Z     getattr(self, test_name)()
2025-12-04T09:59:13.3693418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3693507Z     fn()
2025-12-04T09:59:13.3693987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3694084Z     method(*args, **kwargs)
2025-12-04T09:59:13.3694558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3694653Z     method(*args, **kwargs)
2025-12-04T09:59:13.3695126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3695219Z     with policy():
2025-12-04T09:59:13.3695737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3695838Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3697298Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328.
2025-12-04T09:59:13.3697305Z 
2025-12-04T09:59:13.3697521Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3698233Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3698241Z 
2025-12-04T09:59:13.3698503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3698720Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3698900Z ====================== 1 failed, 26 deselected in 27.84s =======================
2025-12-04T09:59:13.3698999Z Got exit code 1
2025-12-04T09:59:13.3699637Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.3700040Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.3700657Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml
2025-12-04T09:59:13.3700825Z ============================= test session starts ==============================
2025-12-04T09:59:13.3701203Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3701318Z cachedir: .pytest_cache
2025-12-04T09:59:13.3701836Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3701989Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3702096Z configfile: pytest.ini
2025-12-04T09:59:13.3702627Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3702842Z collecting ... collected 60 items / 8 deselected / 52 selected
2025-12-04T09:59:13.3702980Z stepcurrent: skipping 8 already run items.
2025-12-04T09:59:13.3703089Z Running 19 items in this shard
2025-12-04T09:59:13.3703094Z 
2025-12-04T09:59:13.3704241Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:39:25.944000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 54244
2025-12-04T09:59:13.3704738Z I1204 09:39:25.945000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 54245
2025-12-04T09:59:13.3705235Z I1204 09:39:25.946000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 54246
2025-12-04T09:59:13.3705724Z I1204 09:39:25.946000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 54247
2025-12-04T09:59:13.3707748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3707853Z   _warn_cpu_init()
2025-12-04T09:59:13.3709860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3709959Z   _warn_cpu_init()
2025-12-04T09:59:13.3711738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3711834Z   _warn_cpu_init()
2025-12-04T09:59:13.3713363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3713521Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3715047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3715224Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3716732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3716899Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3718694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3718781Z   _warn_cpu_init()
2025-12-04T09:59:13.3720302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3720443Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3721692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3721940Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3722998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3723240Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3724220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3724459Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3726177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3726385Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3728089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3728255Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3729244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3729499Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3730501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3730753Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3731751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3731962Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3732955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3733072Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3734192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3734418Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3735942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3736090Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3737264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3737488Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3738305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3738417Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3739193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3739299Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3740058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3740172Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3740954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3741073Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3741827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3741930Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3742685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3742789Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3743545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3743679Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3744434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3744590Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3745046Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3745584Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3746584Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3747094Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3748095Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3748492Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3749517Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3749950Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3750807Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3751273Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3752128Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3752533Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3753385Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3753823Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3755425Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168.
2025-12-04T09:59:13.3755757Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3756343Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3757443Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3757798Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3758431Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3758949Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3759348Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3759827Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3760712Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3761168Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3762058Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3762407Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3763263Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3763696Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3764576Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3765008Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3765854Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3766255Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3767185Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3767631Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3769213Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.3769539Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3770123Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3771251Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3771608Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3772240Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3772731Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3773128Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3773606Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3774499Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3774948Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3775836Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3776187Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3777342Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3777867Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3778835Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3779313Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3780271Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3780750Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3781715Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3782212Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3783984Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.3784409Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3785065Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3786332Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3786697Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3787408Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3787955Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3788403Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3789044Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3790043Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3790489Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3791367Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3791719Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3792601Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3793034Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3793894Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3794322Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3795195Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3795596Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3796447Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3796884Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3798457Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.3798845Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3799426Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3800520Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3800843Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3801477Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3801965Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3802055Z dist init r=2, world=4
2025-12-04T09:59:13.3802139Z dist init r=1, world=4
2025-12-04T09:59:13.3802227Z dist init r=3, world=4
2025-12-04T09:59:13.3802310Z dist init r=0, world=4
2025-12-04T09:59:13.3803344Z [rank2]:[W1204 09:39:52.322601367 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3804353Z [rank1]:[W1204 09:39:52.323888686 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3805392Z [rank3]:[W1204 09:39:52.326083814 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3806403Z [rank0]:[W1204 09:39:52.330574149 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3806492Z FAILED [47.5521s] [  5%]
2025-12-04T09:59:13.3806497Z 
2025-12-04T09:59:13.3806637Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3807003Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _
2025-12-04T09:59:13.3807116Z Traceback (most recent call last):
2025-12-04T09:59:13.3807628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3807731Z     self._join_processes(fn)
2025-12-04T09:59:13.3808255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3808379Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3808921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3809023Z     raise RuntimeError(error)
2025-12-04T09:59:13.3809231Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3809369Z Traceback (most recent call last):
2025-12-04T09:59:13.3809847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3809945Z     getattr(self, test_name)()
2025-12-04T09:59:13.3810435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3810540Z     fn()
2025-12-04T09:59:13.3810998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3811091Z     method(*args, **kwargs)
2025-12-04T09:59:13.3811538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3811639Z     method(*args, **kwargs)
2025-12-04T09:59:13.3812083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3812166Z     with policy():
2025-12-04T09:59:13.3812818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3812922Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3814175Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.3814183Z 
2025-12-04T09:59:13.3814381Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3815118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3815130Z 
2025-12-04T09:59:13.3815376Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3815383Z 
2025-12-04T09:59:13.3815388Z 
2025-12-04T09:59:13.3815590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3815846Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3816705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml -
2025-12-04T09:59:13.3817062Z =========================== short test summary info ============================
2025-12-04T09:59:13.3818006Z FAILED [47.5521s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.3818124Z Traceback (most recent call last):
2025-12-04T09:59:13.3818678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3818794Z     getattr(self, test_name)()
2025-12-04T09:59:13.3819368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3819456Z     fn()
2025-12-04T09:59:13.3819965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3820072Z     method(*args, **kwargs)
2025-12-04T09:59:13.3820573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3820674Z     method(*args, **kwargs)
2025-12-04T09:59:13.3821388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3821488Z     with policy():
2025-12-04T09:59:13.3822002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3822175Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3823500Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.3823547Z 
2025-12-04T09:59:13.3823765Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3824549Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3824554Z 
2025-12-04T09:59:13.3824821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3824996Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3825169Z ======================= 1 failed, 8 deselected in 47.77s =======================
2025-12-04T09:59:13.3825270Z Got exit code 1
2025-12-04T09:59:13.3825376Z Retrying single test...
2025-12-04T09:59:13.3826009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml
2025-12-04T09:59:13.3826169Z ============================= test session starts ==============================
2025-12-04T09:59:13.3826512Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3826623Z cachedir: .pytest_cache
2025-12-04T09:59:13.3827136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3827257Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3827368Z configfile: pytest.ini
2025-12-04T09:59:13.3827902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3828121Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.3829022Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3829135Z Running 1 items in this shard
2025-12-04T09:59:13.3829141Z 
2025-12-04T09:59:13.3830285Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:40:18.514000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 55493
2025-12-04T09:59:13.3830778Z I1204 09:40:18.514000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 55494
2025-12-04T09:59:13.3831275Z I1204 09:40:18.515000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 55495
2025-12-04T09:59:13.3831797Z I1204 09:40:18.516000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 55496
2025-12-04T09:59:13.3834002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3834097Z   _warn_cpu_init()
2025-12-04T09:59:13.3836008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3836132Z   _warn_cpu_init()
2025-12-04T09:59:13.3837777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3837939Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3839541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3839702Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3841599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3841695Z   _warn_cpu_init()
2025-12-04T09:59:13.3844166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3844279Z   _warn_cpu_init()
2025-12-04T09:59:13.3845943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3846106Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3847801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3847966Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3848935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3849169Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3850138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3850394Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3852060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3852245Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3853207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3853429Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3854383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3854603Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3855572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.3855683Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3856734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3856970Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3858906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3859081Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3860089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3860305Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3861293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3861545Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3863278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3863452Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3864437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3864660Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.3865454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3865564Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3866340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3866474Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3867245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
﻿2025-12-04T09:59:13.3870360Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3871091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.3871205Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3871924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3872025Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3872741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3872846Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3873553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3873684Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3874390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.3874500Z   return func(*args, **kwargs)
2025-12-04T09:59:13.3874934Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3875487Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3876436Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3876913Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3877853Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3878226Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3879159Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3879624Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3880522Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3880984Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3882023Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3882430Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3883277Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3883707Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3885347Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168.
2025-12-04T09:59:13.3885674Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3886262Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3887355Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3887684Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3888317Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3888832Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.3889234Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3889702Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3890602Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3891054Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3891958Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3892311Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3893159Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3893594Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3894441Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3894906Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3895756Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3896161Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3897336Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3897895Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3899684Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.3900043Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3900703Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3901938Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3902311Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3903057Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3903608Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.3904056Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3904589Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3905596Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3906134Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3907124Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3907518Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3908482Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3909203Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3910057Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3910489Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3911332Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3911771Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3912621Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3913059Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3914635Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.3914957Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3915547Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3916677Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3917004Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3917632Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3918119Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.3918517Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.3918985Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.3919898Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3920349Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.3921585Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3921991Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.3923022Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3923513Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3924469Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3924958Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.3925967Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3926421Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.3927385Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3927875Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.3929647Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.3930011Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3930718Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3931956Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3932325Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.3933039Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3933698Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.3933796Z dist init r=0, world=4
2025-12-04T09:59:13.3933923Z dist init r=1, world=4
2025-12-04T09:59:13.3934022Z dist init r=3, world=4
2025-12-04T09:59:13.3934115Z dist init r=2, world=4
2025-12-04T09:59:13.3935201Z [rank0]:[W1204 09:40:41.558527906 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3936363Z [rank1]:[W1204 09:40:41.561148993 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3937663Z [rank3]:[W1204 09:40:41.562651782 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3938855Z [rank2]:[W1204 09:40:41.563403561 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.3938956Z FAILED [41.5665s] [100%]
2025-12-04T09:59:13.3938962Z 
2025-12-04T09:59:13.3939115Z =================================== FAILURES ===================================
2025-12-04T09:59:13.3939519Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _
2025-12-04T09:59:13.3939676Z Traceback (most recent call last):
2025-12-04T09:59:13.3940228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.3940338Z     self._join_processes(fn)
2025-12-04T09:59:13.3940933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.3941071Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.3941671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.3941790Z     raise RuntimeError(error)
2025-12-04T09:59:13.3942021Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.3942140Z Traceback (most recent call last):
2025-12-04T09:59:13.3942683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3942791Z     getattr(self, test_name)()
2025-12-04T09:59:13.3943323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3943414Z     fn()
2025-12-04T09:59:13.3943924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3944062Z     method(*args, **kwargs)
2025-12-04T09:59:13.3944566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3944665Z     method(*args, **kwargs)
2025-12-04T09:59:13.3945181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3945277Z     with policy():
2025-12-04T09:59:13.3945792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3945898Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3947241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.3947258Z 
2025-12-04T09:59:13.3947475Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3948262Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3948268Z 
2025-12-04T09:59:13.3948541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3948550Z 
2025-12-04T09:59:13.3948809Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3948923Z Traceback (most recent call last):
2025-12-04T09:59:13.3949404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3949545Z     getattr(self, test_name)()
2025-12-04T09:59:13.3950025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3950105Z     fn()
2025-12-04T09:59:13.3950552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3950647Z     method(*args, **kwargs)
2025-12-04T09:59:13.3951093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3951189Z     method(*args, **kwargs)
2025-12-04T09:59:13.3951662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3951747Z     with policy():
2025-12-04T09:59:13.3952199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3952294Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3953466Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.3953480Z 
2025-12-04T09:59:13.3953668Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3954365Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3954372Z 
2025-12-04T09:59:13.3954613Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3954618Z 
2025-12-04T09:59:13.3954622Z 
2025-12-04T09:59:13.3954819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.3955059Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.3955801Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml -
2025-12-04T09:59:13.3955952Z =========================== short test summary info ============================
2025-12-04T09:59:13.3956790Z FAILED [41.5665s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.3956896Z Traceback (most recent call last):
2025-12-04T09:59:13.3957391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3957489Z     getattr(self, test_name)()
2025-12-04T09:59:13.3957968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3958052Z     fn()
2025-12-04T09:59:13.3958524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3958628Z     method(*args, **kwargs)
2025-12-04T09:59:13.3959073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3959164Z     method(*args, **kwargs)
2025-12-04T09:59:13.3959611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3959698Z     with policy():
2025-12-04T09:59:13.3960150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3960251Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3961452Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.3961458Z 
2025-12-04T09:59:13.3961656Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3962357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3962362Z 
2025-12-04T09:59:13.3962598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3962631Z 
2025-12-04T09:59:13.3962772Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.3962881Z Traceback (most recent call last):
2025-12-04T09:59:13.3963376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.3963474Z     getattr(self, test_name)()
2025-12-04T09:59:13.3963952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.3964036Z     fn()
2025-12-04T09:59:13.3964481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3964579Z     method(*args, **kwargs)
2025-12-04T09:59:13.3965024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.3965115Z     method(*args, **kwargs)
2025-12-04T09:59:13.3965568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.3965653Z     with policy():
2025-12-04T09:59:13.3966107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.3966202Z     raise RuntimeError(msg)
2025-12-04T09:59:13.3967395Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.3967401Z 
2025-12-04T09:59:13.3967598Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.3968292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3968299Z 
2025-12-04T09:59:13.3968540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.3968701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.3968858Z ====================== 1 failed, 26 deselected in 41.78s =======================
2025-12-04T09:59:13.3968971Z Got exit code 1
2025-12-04T09:59:13.3969062Z Retrying single test...
2025-12-04T09:59:13.3969614Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml
2025-12-04T09:59:13.3969764Z ============================= test session starts ==============================
2025-12-04T09:59:13.3970072Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.3970174Z cachedir: .pytest_cache
2025-12-04T09:59:13.3970629Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.3970740Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.3970839Z configfile: pytest.ini
2025-12-04T09:59:13.3971338Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.3971536Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.3972304Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.3972400Z Running 1 items in this shard
2025-12-04T09:59:13.3972405Z 
2025-12-04T09:59:13.3973604Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:41:04.533000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 56742
2025-12-04T09:59:13.3974222Z I1204 09:41:04.534000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 56743
2025-12-04T09:59:13.3974692Z I1204 09:41:04.535000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 56744
2025-12-04T09:59:13.3975155Z I1204 09:41:04.536000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 56745
2025-12-04T09:59:13.3977356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3977460Z   _warn_cpu_init()
2025-12-04T09:59:13.3979498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3979607Z   _warn_cpu_init()
2025-12-04T09:59:13.3981617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3981724Z   _warn_cpu_init()
2025-12-04T09:59:13.3983465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3983639Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3985340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3985513Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3987227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3987424Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3989585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.3989703Z   _warn_cpu_init()
2025-12-04T09:59:13.3991237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3991382Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3992264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3992473Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3993357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3993566Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3994461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.3994679Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.3996207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3996364Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3997911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3998064Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.3999570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.3999720Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4000625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.4000820Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.4001707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.4001897Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.4002778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.4002995Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.4003868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.4004090Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T09:59:13.4004963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.4005159Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.4006035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4006140Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4006830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4006928Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4007652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4007747Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4008418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4008519Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4009194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4009301Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4010004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4010099Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4010773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4010869Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4011540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4011636Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4012311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4012442Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4012849Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4013336Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4014223Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4014669Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4015582Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4015936Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4017062Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4017554Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4018517Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4019007Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4019972Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4020461Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4021628Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4022131Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4023979Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168.
2025-12-04T09:59:13.4024353Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4025010Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4026264Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4026627Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4027383Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4027937Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4028388Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4028923Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4029930Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4030483Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4031489Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4031886Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4032958Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4033393Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4034249Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4034676Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4035556Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4035959Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4036811Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4037251Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4038855Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4039190Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4039769Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4040870Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4041215Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4041853Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4042339Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4042740Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4043248Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4044134Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4044586Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4045470Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4045820Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4046676Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4047107Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4047998Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4048429Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4049271Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4049670Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4050525Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4050996Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4052569Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.4052897Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4053485Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4054625Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4054946Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4055579Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4056096Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4056572Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4057274Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4058279Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4058784Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4059779Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4060179Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4061148Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4061667Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4062628Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4063113Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4064067Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4064523Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4065515Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4066014Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4067779Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4068181Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4068840Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4070060Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4070387Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4071052Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4071538Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4071627Z dist init r=1, world=4
2025-12-04T09:59:13.4071722Z dist init r=2, world=4
2025-12-04T09:59:13.4071805Z dist init r=0, world=4
2025-12-04T09:59:13.4071891Z dist init r=3, world=4
2025-12-04T09:59:13.4072916Z [rank2]:[W1204 09:41:32.789336263 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4073931Z [rank1]:[W1204 09:41:32.789805389 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4074955Z [rank0]:[W1204 09:41:32.791971395 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4075992Z [rank3]:[W1204 09:41:32.875642289 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4076089Z FAILED [49.6786s] [100%]
2025-12-04T09:59:13.4076094Z 
2025-12-04T09:59:13.4076223Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4076580Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _
2025-12-04T09:59:13.4076694Z Traceback (most recent call last):
2025-12-04T09:59:13.4077180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4077280Z     self._join_processes(fn)
2025-12-04T09:59:13.4077837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4077962Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4078503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4078602Z     raise RuntimeError(error)
2025-12-04T09:59:13.4078807Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4078919Z Traceback (most recent call last):
2025-12-04T09:59:13.4079395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4079490Z     getattr(self, test_name)()
2025-12-04T09:59:13.4080000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4080076Z     fn()
2025-12-04T09:59:13.4080531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4080622Z     method(*args, **kwargs)
2025-12-04T09:59:13.4081065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4081162Z     method(*args, **kwargs)
2025-12-04T09:59:13.4081606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4081725Z     with policy():
2025-12-04T09:59:13.4082170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4082264Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4083444Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4083452Z 
2025-12-04T09:59:13.4083641Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4084344Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4084349Z 
2025-12-04T09:59:13.4084581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4084588Z 
2025-12-04T09:59:13.4084592Z 
2025-12-04T09:59:13.4084784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4085018Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4085724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml -
2025-12-04T09:59:13.4085907Z =========================== short test summary info ============================
2025-12-04T09:59:13.4086742Z FAILED [49.6786s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4086849Z Traceback (most recent call last):
2025-12-04T09:59:13.4087341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4087438Z     getattr(self, test_name)()
2025-12-04T09:59:13.4087912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4087991Z     fn()
2025-12-04T09:59:13.4088435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4088554Z     method(*args, **kwargs)
2025-12-04T09:59:13.4089004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4089097Z     method(*args, **kwargs)
2025-12-04T09:59:13.4089536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4089619Z     with policy():
2025-12-04T09:59:13.4090070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4090166Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4091333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4091372Z 
2025-12-04T09:59:13.4091562Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4092261Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4092266Z 
2025-12-04T09:59:13.4092504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4092662Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4092879Z ====================== 1 failed, 26 deselected in 49.90s =======================
2025-12-04T09:59:13.4092963Z Got exit code 1
2025-12-04T09:59:13.4093590Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda
2025-12-04T09:59:13.4093962Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.4094512Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml
2025-12-04T09:59:13.4094653Z ============================= test session starts ==============================
2025-12-04T09:59:13.4094970Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4095067Z cachedir: .pytest_cache
2025-12-04T09:59:13.4095525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4095631Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4095723Z configfile: pytest.ini
2025-12-04T09:59:13.4096200Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4096470Z collecting ... collected 60 items / 9 deselected / 51 selected
2025-12-04T09:59:13.4096597Z stepcurrent: skipping 9 already run items.
2025-12-04T09:59:13.4096911Z Running 18 items in this shard
2025-12-04T09:59:13.4096918Z 
2025-12-04T09:59:13.4098054Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:41:59.014000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 57991
2025-12-04T09:59:13.4098553Z I1204 09:41:59.015000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 57992
2025-12-04T09:59:13.4099049Z I1204 09:41:59.015000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 57993
2025-12-04T09:59:13.4099543Z I1204 09:41:59.016000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 57994
2025-12-04T09:59:13.4101628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4101735Z   _warn_cpu_init()
2025-12-04T09:59:13.4103739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4103866Z   _warn_cpu_init()
2025-12-04T09:59:13.4104897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4104995Z   _init_core_state(
2025-12-04T09:59:13.4106712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4106907Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4109039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4109132Z   _warn_cpu_init()
2025-12-04T09:59:13.4110029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4110124Z   _init_core_state(
2025-12-04T09:59:13.4111638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4111799Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4113615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4113707Z   _warn_cpu_init()
2025-12-04T09:59:13.4114603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4114701Z   _init_core_state(
2025-12-04T09:59:13.4116239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4116387Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4117292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4117378Z   _init_core_state(
2025-12-04T09:59:13.4118902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4119073Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4120594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4120913Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4122756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4122930Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4124628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4124800Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4125802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4125927Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4126762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4126871Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4127651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4127757Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4128543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4128653Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4129449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4129565Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4130319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4130435Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4131193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4131301Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4132068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4132211Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4132983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4133087Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4133668Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4134194Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4135164Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4135707Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4136930Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4137338Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4138306Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4138792Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4139758Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4140245Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4141244Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4141690Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4142660Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4143166Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4144964Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168.
2025-12-04T09:59:13.4145333Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4145989Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4147219Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4147622Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4148436Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4149080Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4149505Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4150044Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4150975Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4151465Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4152390Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4152761Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4153673Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4154132Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4155066Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4155522Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4156433Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4156847Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4157749Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4158245Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4159891Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4160343Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4160926Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4162269Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4162612Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4163282Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4163806Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4164258Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4164764Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4165705Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4166193Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4167118Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4167489Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4168409Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4168893Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4169800Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4170254Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4171167Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4171587Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4172523Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4172995Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4174676Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4175041Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4175626Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4176982Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4177351Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4178112Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4178657Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4179115Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4179660Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4180663Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4181178Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4182166Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4182564Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4183562Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4184051Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4185014Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4185504Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4186499Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4186957Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4187923Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4188419Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4190253Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4190998Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4191579Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4192671Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4193019Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4193667Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4194150Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4194240Z dist init r=2, world=4
2025-12-04T09:59:13.4194333Z dist init r=1, world=4
2025-12-04T09:59:13.4194419Z dist init r=0, world=4
2025-12-04T09:59:13.4194505Z dist init r=3, world=4
2025-12-04T09:59:13.4195534Z [rank2]:[W1204 09:42:23.660981110 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4196547Z [rank1]:[W1204 09:42:23.661337285 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4197596Z [rank0]:[W1204 09:42:23.661823435 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4198598Z [rank3]:[W1204 09:42:23.661941869 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4198700Z FAILED [42.4544s] [  5%]
2025-12-04T09:59:13.4198705Z 
2025-12-04T09:59:13.4198836Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4199177Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _
2025-12-04T09:59:13.4199291Z Traceback (most recent call last):
2025-12-04T09:59:13.4199809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4199918Z     self._join_processes(fn)
2025-12-04T09:59:13.4200437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4200562Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4201108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4201210Z     raise RuntimeError(error)
2025-12-04T09:59:13.4201417Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4201532Z Traceback (most recent call last):
2025-12-04T09:59:13.4202011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4202142Z     getattr(self, test_name)()
2025-12-04T09:59:13.4202618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4202696Z     fn()
2025-12-04T09:59:13.4203151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4203244Z     method(*args, **kwargs)
2025-12-04T09:59:13.4203690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4203791Z     method(*args, **kwargs)
2025-12-04T09:59:13.4204264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4204359Z     with policy():
2025-12-04T09:59:13.4204808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4204905Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4206080Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168.
2025-12-04T09:59:13.4206085Z 
2025-12-04T09:59:13.4206274Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4206960Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4206966Z 
2025-12-04T09:59:13.4207200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4207205Z 
2025-12-04T09:59:13.4207364Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4207469Z Traceback (most recent call last):
2025-12-04T09:59:13.4207958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4208063Z     getattr(self, test_name)()
2025-12-04T09:59:13.4208564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4208640Z     fn()
2025-12-04T09:59:13.4209093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4209185Z     method(*args, **kwargs)
2025-12-04T09:59:13.4209640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4209729Z     method(*args, **kwargs)
2025-12-04T09:59:13.4210173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4210268Z     with policy():
2025-12-04T09:59:13.4210743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4210840Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4211998Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4212003Z 
2025-12-04T09:59:13.4212194Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4212881Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4212913Z 
2025-12-04T09:59:13.4213147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4213151Z 
2025-12-04T09:59:13.4213155Z 
2025-12-04T09:59:13.4213358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4213592Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4214304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml -
2025-12-04T09:59:13.4214461Z =========================== short test summary info ============================
2025-12-04T09:59:13.4215289Z FAILED [42.4544s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4215429Z Traceback (most recent call last):
2025-12-04T09:59:13.4215918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4216016Z     getattr(self, test_name)()
2025-12-04T09:59:13.4216585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4216837Z     fn()
2025-12-04T09:59:13.4217358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4217464Z     method(*args, **kwargs)
2025-12-04T09:59:13.4217969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4218083Z     method(*args, **kwargs)
2025-12-04T09:59:13.4218588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4218680Z     with policy():
2025-12-04T09:59:13.4219200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4219307Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4220652Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168.
2025-12-04T09:59:13.4220659Z 
2025-12-04T09:59:13.4221106Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4221880Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4221897Z 
2025-12-04T09:59:13.4222163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4222171Z 
2025-12-04T09:59:13.4222331Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4222459Z Traceback (most recent call last):
2025-12-04T09:59:13.4223076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4223191Z     getattr(self, test_name)()
2025-12-04T09:59:13.4223734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4223822Z     fn()
2025-12-04T09:59:13.4224336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4224443Z     method(*args, **kwargs)
2025-12-04T09:59:13.4224950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4225065Z     method(*args, **kwargs)
2025-12-04T09:59:13.4225608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4225705Z     with policy():
2025-12-04T09:59:13.4226225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4226334Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4227646Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4227651Z 
2025-12-04T09:59:13.4227901Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4228680Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4228688Z 
2025-12-04T09:59:13.4228947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4229126Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4229310Z ======================= 1 failed, 9 deselected in 42.67s =======================
2025-12-04T09:59:13.4229402Z Got exit code 1
2025-12-04T09:59:13.4229505Z Retrying single test...
2025-12-04T09:59:13.4230140Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml
2025-12-04T09:59:13.4230301Z ============================= test session starts ==============================
2025-12-04T09:59:13.4230657Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4230763Z cachedir: .pytest_cache
2025-12-04T09:59:13.4231276Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4231409Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4231513Z configfile: pytest.ini
2025-12-04T09:59:13.4232087Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4232308Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.4233387Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4233495Z Running 1 items in this shard
2025-12-04T09:59:13.4233502Z 
2025-12-04T09:59:13.4234500Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:42:46.014000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 59240
2025-12-04T09:59:13.4234950Z I1204 09:42:46.015000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 59241
2025-12-04T09:59:13.4235415Z I1204 09:42:46.016000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 59242
2025-12-04T09:59:13.4235849Z I1204 09:42:46.017000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 59243
2025-12-04T09:59:13.4237654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4237767Z   _warn_cpu_init()
2025-12-04T09:59:13.4239560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4239645Z   _warn_cpu_init()
2025-12-04T09:59:13.4241424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4241540Z   _warn_cpu_init()
2025-12-04T09:59:13.4242453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4242537Z   _init_core_state(
2025-12-04T09:59:13.4243431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4243520Z   _init_core_state(
2025-12-04T09:59:13.4244417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4244510Z   _init_core_state(
2025-12-04T09:59:13.4246064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4246212Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4247729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4247875Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4249415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4249563Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4251349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4251462Z   _warn_cpu_init()
2025-12-04T09:59:13.4252371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4252453Z   _init_core_state(
2025-12-04T09:59:13.4253968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4254151Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4255668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4255820Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4257664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4257845Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4258834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4258956Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4260697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4260860Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4261644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4261757Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4262525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4262635Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4263423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4263539Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4264297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4264411Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4265161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4265267Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4266027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4266164Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4266926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4267029Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4267786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4267897Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4268384Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4269030Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4269937Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4270389Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4271275Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4271633Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4272489Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4273098Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4274038Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4274496Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4275585Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4276029Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4276994Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4277478Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4279175Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.4279536Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4280201Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4281394Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4281755Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4282447Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4283007Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4283449Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4283969Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4285027Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4285506Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4286441Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4286922Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4287811Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4288247Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4289100Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4289531Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4290386Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4290813Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4291675Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4292112Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4293669Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 707723264 and is now 10516103168.
2025-12-04T09:59:13.4294032Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4294615Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4295696Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4296063Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4296937Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4297502Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4297960Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4298498Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4299493Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4300005Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4300999Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4301434Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4302406Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4302888Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4303850Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4304339Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4305315Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4305769Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4306726Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4307221Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4309192Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4309553Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4310137Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4311227Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4311589Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4312224Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4312711Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4313110Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4313586Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4314474Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4314922Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4315833Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4316186Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4317045Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4317481Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4318363Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4318795Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4319643Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4320043Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4321048Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4321749Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4323501Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4323874Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4324589Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4325810Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4326180Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4326897Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4327443Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4327548Z dist init r=0, world=4
2025-12-04T09:59:13.4327643Z dist init r=3, world=4
2025-12-04T09:59:13.4327748Z dist init r=1, world=4
2025-12-04T09:59:13.4327843Z dist init r=2, world=4
2025-12-04T09:59:13.4329006Z [rank1]:[W1204 09:43:19.704075392 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4330188Z [rank0]:[W1204 09:43:19.704590489 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4331328Z [rank3]:[W1204 09:43:19.706943554 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4332462Z [rank2]:[W1204 09:43:19.717925474 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4332564Z FAILED [48.2722s] [100%]
2025-12-04T09:59:13.4332569Z 
2025-12-04T09:59:13.4332764Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4333154Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _
2025-12-04T09:59:13.4333278Z Traceback (most recent call last):
2025-12-04T09:59:13.4333905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4334006Z     self._join_processes(fn)
2025-12-04T09:59:13.4334527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4334650Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4335194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4335330Z     raise RuntimeError(error)
2025-12-04T09:59:13.4335538Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.4335654Z Traceback (most recent call last):
2025-12-04T09:59:13.4336137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4336236Z     getattr(self, test_name)()
2025-12-04T09:59:13.4336978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4337068Z     fn()
2025-12-04T09:59:13.4337628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4337728Z     method(*args, **kwargs)
2025-12-04T09:59:13.4338228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4338343Z     method(*args, **kwargs)
2025-12-04T09:59:13.4338852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4338949Z     with policy():
2025-12-04T09:59:13.4339460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4339566Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4340884Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4340893Z 
2025-12-04T09:59:13.4341108Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4341880Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4341891Z 
2025-12-04T09:59:13.4342179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4342185Z 
2025-12-04T09:59:13.4342190Z 
2025-12-04T09:59:13.4342407Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4342674Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4343473Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml -
2025-12-04T09:59:13.4343656Z =========================== short test summary info ============================
2025-12-04T09:59:13.4344578Z FAILED [48.2722s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.4344701Z Traceback (most recent call last):
2025-12-04T09:59:13.4345285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4345398Z     getattr(self, test_name)()
2025-12-04T09:59:13.4345940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4346027Z     fn()
2025-12-04T09:59:13.4346531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4346642Z     method(*args, **kwargs)
2025-12-04T09:59:13.4347145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4347248Z     method(*args, **kwargs)
2025-12-04T09:59:13.4347801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4347897Z     with policy():
2025-12-04T09:59:13.4348411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4348516Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4349820Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4349857Z 
2025-12-04T09:59:13.4350054Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4350729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4350736Z 
2025-12-04T09:59:13.4350976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4351136Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4351293Z ====================== 1 failed, 26 deselected in 48.49s =======================
2025-12-04T09:59:13.4351382Z Got exit code 1
2025-12-04T09:59:13.4351475Z Retrying single test...
2025-12-04T09:59:13.4352028Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml
2025-12-04T09:59:13.4352169Z ============================= test session starts ==============================
2025-12-04T09:59:13.4352477Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4352579Z cachedir: .pytest_cache
2025-12-04T09:59:13.4353033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4353138Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4353243Z configfile: pytest.ini
2025-12-04T09:59:13.4353741Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4353940Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.4354692Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4354791Z Running 1 items in this shard
2025-12-04T09:59:13.4354797Z 
2025-12-04T09:59:13.4355801Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:43:39.494000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 60489
2025-12-04T09:59:13.4356241Z I1204 09:43:39.495000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 60490
2025-12-04T09:59:13.4356713Z I1204 09:43:39.496000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 60491
2025-12-04T09:59:13.4357144Z I1204 09:43:39.497000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 60492
2025-12-04T09:59:13.4358972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4359086Z   _warn_cpu_init()
2025-12-04T09:59:13.4360879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4360973Z   _warn_cpu_init()
2025-12-04T09:59:13.4361874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4361993Z   _init_core_state(
2025-12-04T09:59:13.4362888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4362984Z   _init_core_state(
2025-12-04T09:59:13.4364503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4364652Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4366170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4366319Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4368135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4368222Z   _warn_cpu_init()
2025-12-04T09:59:13.4370020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4370129Z   _warn_cpu_init()
2025-12-04T09:59:13.4371038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4371125Z   _init_core_state(
2025-12-04T09:59:13.4372631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4372815Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4373712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T09:59:13.4373805Z   _init_core_state(
2025-12-04T09:59:13.4375318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4375492Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4377286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4377463Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4379160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4379322Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4381029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4381228Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4382237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4382351Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4383136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4383250Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4384010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4384135Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4384928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4385047Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4385810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4385920Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4386687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4386797Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4387583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4387704Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4388464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4388580Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4389381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4389510Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4389925Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4390399Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4391305Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4391759Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4392649Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4393004Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4393855Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4394301Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4395180Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4395627Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4396476Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4396888Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4397834Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4398268Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4399836Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4400160Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4400778Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4401863Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4402195Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4402830Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4403341Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4403750Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4404226Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4405121Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4405573Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4406462Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4406813Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4407695Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4411818Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4412735Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4413181Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4414038Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4414512Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4415372Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4415814Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4417767Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168.
2025-12-04T09:59:13.4418193Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4418854Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4420078Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4420479Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4421413Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4421975Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4422430Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4422968Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4423966Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4424480Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4425470Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4425937Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4426906Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4427391Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4428353Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4428836Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4429840Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4430294Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4431253Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4431750Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4433569Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.4433900Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4434482Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4435833Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4436176Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4436846Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4437363Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4437786Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4438286Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4439228Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4439711Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4440662Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4441032Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4441934Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4442392Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4443327Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4443786Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4444689Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4445113Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4446018Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4446517Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4448187Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4448517Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4449127Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4450219Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4450539Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4451168Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4451655Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4451745Z dist init r=1, world=4
2025-12-04T09:59:13.4451837Z dist init r=0, world=4
2025-12-04T09:59:13.4451921Z dist init r=3, world=4
2025-12-04T09:59:13.4452008Z dist init r=2, world=4
2025-12-04T09:59:13.4453048Z [rank1]:[W1204 09:44:03.462422849 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4454102Z [rank0]:[W1204 09:44:03.472381630 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4455119Z [rank3]:[W1204 09:44:03.472423753 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4456127Z [rank2]:[W1204 09:44:03.537324995 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4456223Z FAILED [43.7497s] [100%]
2025-12-04T09:59:13.4456255Z 
2025-12-04T09:59:13.4456460Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4456986Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _
2025-12-04T09:59:13.4457113Z Traceback (most recent call last):
2025-12-04T09:59:13.4457661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4457772Z     self._join_processes(fn)
2025-12-04T09:59:13.4458363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4458501Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4459153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4459263Z     raise RuntimeError(error)
2025-12-04T09:59:13.4459496Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4459624Z Traceback (most recent call last):
2025-12-04T09:59:13.4460163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4460274Z     getattr(self, test_name)()
2025-12-04T09:59:13.4460817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4460938Z     fn()
2025-12-04T09:59:13.4461448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4461552Z     method(*args, **kwargs)
2025-12-04T09:59:13.4462059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4462168Z     method(*args, **kwargs)
2025-12-04T09:59:13.4462674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4462778Z     with policy():
2025-12-04T09:59:13.4463285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4463389Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4464701Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168.
2025-12-04T09:59:13.4464711Z 
2025-12-04T09:59:13.4464926Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4465705Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4465712Z 
2025-12-04T09:59:13.4466003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4466009Z 
2025-12-04T09:59:13.4466170Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4466294Z Traceback (most recent call last):
2025-12-04T09:59:13.4466847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4466960Z     getattr(self, test_name)()
2025-12-04T09:59:13.4467497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4467586Z     fn()
2025-12-04T09:59:13.4468097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4468201Z     method(*args, **kwargs)
2025-12-04T09:59:13.4468729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4468956Z     method(*args, **kwargs)
2025-12-04T09:59:13.4469445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4469544Z     with policy():
2025-12-04T09:59:13.4470033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4470137Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4471402Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4471439Z 
2025-12-04T09:59:13.4471646Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4472394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4472400Z 
2025-12-04T09:59:13.4472653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4472658Z 
2025-12-04T09:59:13.4472819Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4472931Z Traceback (most recent call last):
2025-12-04T09:59:13.4473458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4473596Z     getattr(self, test_name)()
2025-12-04T09:59:13.4474113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4474199Z     fn()
2025-12-04T09:59:13.4474697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4474798Z     method(*args, **kwargs)
2025-12-04T09:59:13.4475299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4475401Z     method(*args, **kwargs)
2025-12-04T09:59:13.4475889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4475987Z     with policy():
2025-12-04T09:59:13.4476479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4476580Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4477840Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4477847Z 
2025-12-04T09:59:13.4478078Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4478934Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4478940Z 
2025-12-04T09:59:13.4479188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4479196Z 
2025-12-04T09:59:13.4479200Z 
2025-12-04T09:59:13.4479411Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4479657Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4480408Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml -
2025-12-04T09:59:13.4480605Z =========================== short test summary info ============================
2025-12-04T09:59:13.4481479Z FAILED [43.7497s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4481599Z Traceback (most recent call last):
2025-12-04T09:59:13.4482114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4482221Z     getattr(self, test_name)()
2025-12-04T09:59:13.4482730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4482812Z     fn()
2025-12-04T09:59:13.4483323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4483420Z     method(*args, **kwargs)
2025-12-04T09:59:13.4483895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4483997Z     method(*args, **kwargs)
2025-12-04T09:59:13.4484468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4484558Z     with policy():
2025-12-04T09:59:13.4485043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4485169Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4486604Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168.
2025-12-04T09:59:13.4486614Z 
2025-12-04T09:59:13.4486821Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4487567Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4487579Z 
2025-12-04T09:59:13.4487834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4487839Z 
2025-12-04T09:59:13.4487993Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4488113Z Traceback (most recent call last):
2025-12-04T09:59:13.4488639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4488743Z     getattr(self, test_name)()
2025-12-04T09:59:13.4489269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4489351Z     fn()
2025-12-04T09:59:13.4489844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4490082Z     method(*args, **kwargs)
2025-12-04T09:59:13.4490723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4490832Z     method(*args, **kwargs)
2025-12-04T09:59:13.4491315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4491410Z     with policy():
2025-12-04T09:59:13.4491909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4492014Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4493303Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112.
2025-12-04T09:59:13.4493312Z 
2025-12-04T09:59:13.4493517Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4494368Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4494374Z 
2025-12-04T09:59:13.4494621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4494628Z 
2025-12-04T09:59:13.4494776Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4494892Z Traceback (most recent call last):
2025-12-04T09:59:13.4495401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4495542Z     getattr(self, test_name)()
2025-12-04T09:59:13.4496047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4496130Z     fn()
2025-12-04T09:59:13.4496694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4496968Z     method(*args, **kwargs)
2025-12-04T09:59:13.4497471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4497581Z     method(*args, **kwargs)
2025-12-04T09:59:13.4498140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4498244Z     with policy():
2025-12-04T09:59:13.4498750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4498857Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4500157Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4500163Z 
2025-12-04T09:59:13.4500376Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4501141Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4501148Z 
2025-12-04T09:59:13.4501409Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4501585Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4501773Z ====================== 1 failed, 26 deselected in 43.97s =======================
2025-12-04T09:59:13.4501868Z Got exit code 1
2025-12-04T09:59:13.4502591Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda
2025-12-04T09:59:13.4503000Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.4503618Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml
2025-12-04T09:59:13.4503788Z ============================= test session starts ==============================
2025-12-04T09:59:13.4504135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4504245Z cachedir: .pytest_cache
2025-12-04T09:59:13.4504754Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4504878Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4504981Z configfile: pytest.ini
2025-12-04T09:59:13.4505543Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4505762Z collecting ... collected 60 items / 10 deselected / 50 selected
2025-12-04T09:59:13.4505900Z stepcurrent: skipping 10 already run items.
2025-12-04T09:59:13.4506009Z Running 17 items in this shard
2025-12-04T09:59:13.4506015Z 
2025-12-04T09:59:13.4507191Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:44:27.494000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 61738
2025-12-04T09:59:13.4507691Z I1204 09:44:27.495000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 61739
2025-12-04T09:59:13.4508218Z I1204 09:44:27.495000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 61740
2025-12-04T09:59:13.4508821Z I1204 09:44:27.496000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 61741
2025-12-04T09:59:13.4510758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4510882Z   _warn_cpu_init()
2025-12-04T09:59:13.4512659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4512753Z   _warn_cpu_init()
2025-12-04T09:59:13.4514520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4514610Z   _warn_cpu_init()
2025-12-04T09:59:13.4515536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4515626Z   _init_core_state(
2025-12-04T09:59:13.4516567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4516651Z   _init_core_state(
2025-12-04T09:59:13.4517574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4517661Z   _init_core_state(
2025-12-04T09:59:13.4519205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4519357Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4521038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4521367Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4523079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4523308Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4525326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4525475Z   _warn_cpu_init()
2025-12-04T09:59:13.4526512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4526616Z   _init_core_state(
2025-12-04T09:59:13.4528318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4528487Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4530181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4530355Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4532090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4532254Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4533252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4533363Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4535053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4535199Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4535895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4535996Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4536931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4537098Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4537864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4537979Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4538743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4538851Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4539613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4539754Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4540520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4540639Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4541398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4541512Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4542269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4542372Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4542839Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4543370Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4544373Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4544913Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4545910Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4546305Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4547270Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4547764Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4548749Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4549319Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4550167Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4550571Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4551424Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4551904Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4553495Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4553844Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4554433Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4555561Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4555892Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4556527Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4557010Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4557412Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4557881Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4558803Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4559250Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4560132Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4560480Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4561354Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4561795Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4562643Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4563076Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4563924Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4564354Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4565214Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4565644Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4567234Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168.
2025-12-04T09:59:13.4567773Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4568399Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4569585Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4569927Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4570595Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4571103Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4571565Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4572063Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4573007Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4573483Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4574411Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4574806Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4575710Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4576167Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4577361Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4577858Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4578855Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4579304Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4580268Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4580786Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4582580Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112.
2025-12-04T09:59:13.4582945Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4583607Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4584876Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4585248Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4585961Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4586531Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4586986Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4587510Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4588520Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4589234Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4590143Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4590495Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4591343Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4591783Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4592659Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4593094Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4593942Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4594342Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4595218Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4595653Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4597252Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4597571Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4598162Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4599290Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4599642Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4600274Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4600756Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4600850Z dist init r=2, world=4
2025-12-04T09:59:13.4600935Z dist init r=1, world=4
2025-12-04T09:59:13.4601026Z dist init r=3, world=4
2025-12-04T09:59:13.4601111Z dist init r=0, world=4
2025-12-04T09:59:13.4602135Z [rank2]:[W1204 09:45:00.160275534 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4603193Z [rank1]:[W1204 09:45:00.163833887 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4604203Z [rank3]:[W1204 09:45:00.166140670 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4605214Z [rank0]:[W1204 09:45:00.172471670 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4605331Z FAILED [49.1429s] [  5%]
2025-12-04T09:59:13.4605337Z 
2025-12-04T09:59:13.4605473Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4605850Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _
2025-12-04T09:59:13.4605955Z Traceback (most recent call last):
2025-12-04T09:59:13.4606443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4606538Z     self._join_processes(fn)
2025-12-04T09:59:13.4607053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4607210Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4607743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4607851Z     raise RuntimeError(error)
2025-12-04T09:59:13.4608059Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4608164Z Traceback (most recent call last):
2025-12-04T09:59:13.4608648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4608748Z     getattr(self, test_name)()
2025-12-04T09:59:13.4609215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4609295Z     fn()
2025-12-04T09:59:13.4609744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4609834Z     method(*args, **kwargs)
2025-12-04T09:59:13.4610284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4610373Z     method(*args, **kwargs)
2025-12-04T09:59:13.4610816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4610998Z     with policy():
2025-12-04T09:59:13.4611446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4611545Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4612734Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168.
2025-12-04T09:59:13.4612742Z 
2025-12-04T09:59:13.4612930Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4613652Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4613683Z 
2025-12-04T09:59:13.4613918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4613923Z 
2025-12-04T09:59:13.4614071Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4614175Z Traceback (most recent call last):
2025-12-04T09:59:13.4614662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4614758Z     getattr(self, test_name)()
2025-12-04T09:59:13.4615232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4615317Z     fn()
2025-12-04T09:59:13.4615763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4615881Z     method(*args, **kwargs)
2025-12-04T09:59:13.4616404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4616501Z     method(*args, **kwargs)
2025-12-04T09:59:13.4617159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4617256Z     with policy():
2025-12-04T09:59:13.4617759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4617869Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4619254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112.
2025-12-04T09:59:13.4619262Z 
2025-12-04T09:59:13.4619481Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4620288Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4620294Z 
2025-12-04T09:59:13.4620554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4620560Z 
2025-12-04T09:59:13.4620726Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.4621060Z Traceback (most recent call last):
2025-12-04T09:59:13.4621621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4621732Z     getattr(self, test_name)()
2025-12-04T09:59:13.4622264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4622360Z     fn()
2025-12-04T09:59:13.4622862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4622962Z     method(*args, **kwargs)
2025-12-04T09:59:13.4623538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4623640Z     method(*args, **kwargs)
2025-12-04T09:59:13.4624144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4624239Z     with policy():
2025-12-04T09:59:13.4624748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4624857Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4626233Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4626242Z 
2025-12-04T09:59:13.4626461Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4627269Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4627275Z 
2025-12-04T09:59:13.4627533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4627547Z 
2025-12-04T09:59:13.4627551Z 
2025-12-04T09:59:13.4627766Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4628027Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4628867Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml -
2025-12-04T09:59:13.4629037Z =========================== short test summary info ============================
2025-12-04T09:59:13.4630002Z FAILED [49.1429s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4630120Z Traceback (most recent call last):
2025-12-04T09:59:13.4630660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4630810Z     getattr(self, test_name)()
2025-12-04T09:59:13.4631343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4631431Z     fn()
2025-12-04T09:59:13.4631936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4632038Z     method(*args, **kwargs)
2025-12-04T09:59:13.4632664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4632763Z     method(*args, **kwargs)
2025-12-04T09:59:13.4633329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4633420Z     with policy():
2025-12-04T09:59:13.4633867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4633961Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4635146Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168.
2025-12-04T09:59:13.4635153Z 
2025-12-04T09:59:13.4635340Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4636095Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4636101Z 
2025-12-04T09:59:13.4636330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4636335Z 
2025-12-04T09:59:13.4636480Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4636585Z Traceback (most recent call last):
2025-12-04T09:59:13.4637068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4637169Z     getattr(self, test_name)()
2025-12-04T09:59:13.4637642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4637724Z     fn()
2025-12-04T09:59:13.4638200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4638289Z     method(*args, **kwargs)
2025-12-04T09:59:13.4638740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4638828Z     method(*args, **kwargs)
2025-12-04T09:59:13.4639271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4639362Z     with policy():
2025-12-04T09:59:13.4639808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4639907Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4641125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112.
2025-12-04T09:59:13.4641130Z 
2025-12-04T09:59:13.4641318Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4642040Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4642045Z 
2025-12-04T09:59:13.4642277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4642306Z 
2025-12-04T09:59:13.4642454Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.4642560Z Traceback (most recent call last):
2025-12-04T09:59:13.4643046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4643149Z     getattr(self, test_name)()
2025-12-04T09:59:13.4643620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4643704Z     fn()
2025-12-04T09:59:13.4644148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4644239Z     method(*args, **kwargs)
2025-12-04T09:59:13.4644690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4644781Z     method(*args, **kwargs)
2025-12-04T09:59:13.4645227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4645309Z     with policy():
2025-12-04T09:59:13.4645757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4645856Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4647073Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4647079Z 
2025-12-04T09:59:13.4647273Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4647989Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4647996Z 
2025-12-04T09:59:13.4648225Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4648390Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4648546Z ====================== 1 failed, 10 deselected in 49.36s =======================
2025-12-04T09:59:13.4648661Z Got exit code 1
2025-12-04T09:59:13.4648754Z Retrying single test...
2025-12-04T09:59:13.4649302Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml
2025-12-04T09:59:13.4649453Z ============================= test session starts ==============================
2025-12-04T09:59:13.4649760Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4649853Z cachedir: .pytest_cache
2025-12-04T09:59:13.4650316Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4650423Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4650546Z configfile: pytest.ini
2025-12-04T09:59:13.4651020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4651212Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.4652014Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4652119Z Running 1 items in this shard
2025-12-04T09:59:13.4652124Z 
2025-12-04T09:59:13.4653155Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:45:20.943000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 62987
2025-12-04T09:59:13.4653628Z I1204 09:45:20.944000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 62988
2025-12-04T09:59:13.4654062Z I1204 09:45:20.945000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 62989
2025-12-04T09:59:13.4654508Z I1204 09:45:20.946000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 62990
2025-12-04T09:59:13.4656381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4656486Z   _warn_cpu_init()
2025-12-04T09:59:13.4658680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4658789Z   _warn_cpu_init()
2025-12-04T09:59:13.4660781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4660886Z   _warn_cpu_init()
2025-12-04T09:59:13.4661920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4662046Z   _init_core_state(
2025-12-04T09:59:13.4663082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4663176Z   _init_core_state(
2025-12-04T09:59:13.4664208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4664304Z   _init_core_state(
2025-12-04T09:59:13.4666011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4666213Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4667921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4668123Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4669939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4670095Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4671880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4671971Z   _warn_cpu_init()
2025-12-04T09:59:13.4672881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4672965Z   _init_core_state(
2025-12-04T09:59:13.4674522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4674663Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4676175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4676320Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4677866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4678006Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4678886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4678984Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4680516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4680663Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4681346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4681449Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4682154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4682254Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4682938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4683032Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4683710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4683802Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4684467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4684568Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4685235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4685336Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4686006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4686123Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4686799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4686890Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4687297Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4687771Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4688658Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4689142Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4690014Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4690367Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4691218Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4691678Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4692533Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4692963Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4693814Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4694243Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4695103Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4695541Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4697457Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4697830Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4698490Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4699803Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4700168Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4700886Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4701427Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4701878Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4702411Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4703437Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4703949Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4704942Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4705345Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4706334Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4706824Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4707775Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4708258Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4709328Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4709723Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4710581Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4711015Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4712607Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168.
2025-12-04T09:59:13.4712931Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4713539Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4714665Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4714986Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4715631Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4716112Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4716537Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4717007Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4717895Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4718346Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4719222Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4719606Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4720456Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4721033Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4722154Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4722711Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4723687Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4724130Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4725097Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4725580Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4727401Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4727803Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4728461Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4729731Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4730094Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4730847Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4731391Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4731845Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4732371Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4733369Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4733993Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4734869Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4735221Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4736070Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4736631Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4737734Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4738227Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4739193Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4739636Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4740601Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4741090Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4742920Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4743286Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4743947Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4745221Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4745613Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4746338Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4746878Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4746983Z dist init r=1, world=4
2025-12-04T09:59:13.4747079Z dist init r=2, world=4
2025-12-04T09:59:13.4747171Z dist init r=0, world=4
2025-12-04T09:59:13.4747273Z dist init r=3, world=4
2025-12-04T09:59:13.4748432Z [rank1]:[W1204 09:45:44.144689821 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4749717Z [rank2]:[W1204 09:45:44.145283891 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4750786Z [rank0]:[W1204 09:45:44.148974604 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4751883Z [rank3]:[W1204 09:45:44.152390307 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4751985Z FAILED [41.1533s] [100%]
2025-12-04T09:59:13.4751991Z 
2025-12-04T09:59:13.4752128Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4752531Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _
2025-12-04T09:59:13.4752640Z Traceback (most recent call last):
2025-12-04T09:59:13.4753333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4753451Z     self._join_processes(fn)
2025-12-04T09:59:13.4754016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4754156Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4754738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4754846Z     raise RuntimeError(error)
2025-12-04T09:59:13.4755076Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4755189Z Traceback (most recent call last):
2025-12-04T09:59:13.4755740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4755850Z     getattr(self, test_name)()
2025-12-04T09:59:13.4756366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4756455Z     fn()
2025-12-04T09:59:13.4756943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4757045Z     method(*args, **kwargs)
2025-12-04T09:59:13.4757536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4757635Z     method(*args, **kwargs)
2025-12-04T09:59:13.4758156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4758248Z     with policy():
2025-12-04T09:59:13.4758745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4758853Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4760150Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4760158Z 
2025-12-04T09:59:13.4760371Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4761162Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4761196Z 
2025-12-04T09:59:13.4761453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4761460Z 
2025-12-04T09:59:13.4761469Z 
2025-12-04T09:59:13.4761677Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4761931Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4762715Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml -
2025-12-04T09:59:13.4762909Z =========================== short test summary info ============================
2025-12-04T09:59:13.4763843Z FAILED [41.1533s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4763962Z Traceback (most recent call last):
2025-12-04T09:59:13.4764494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4764607Z     getattr(self, test_name)()
2025-12-04T09:59:13.4765124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4765206Z     fn()
2025-12-04T09:59:13.4765696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4765797Z     method(*args, **kwargs)
2025-12-04T09:59:13.4766283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4766380Z     method(*args, **kwargs)
2025-12-04T09:59:13.4766866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4766962Z     with policy():
2025-12-04T09:59:13.4767482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4767585Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4768881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4768890Z 
2025-12-04T09:59:13.4769204Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4769974Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4769981Z 
2025-12-04T09:59:13.4770226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4770428Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4770595Z ====================== 1 failed, 26 deselected in 41.37s =======================
2025-12-04T09:59:13.4770683Z Got exit code 1
2025-12-04T09:59:13.4770789Z Retrying single test...
2025-12-04T09:59:13.4771370Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml
2025-12-04T09:59:13.4771519Z ============================= test session starts ==============================
2025-12-04T09:59:13.4771858Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4771960Z cachedir: .pytest_cache
2025-12-04T09:59:13.4772446Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4772584Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4772681Z configfile: pytest.ini
2025-12-04T09:59:13.4773191Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4773394Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.4774244Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4774352Z Running 1 items in this shard
2025-12-04T09:59:13.4774383Z 
2025-12-04T09:59:13.4775474Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:46:06.984000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 64236
2025-12-04T09:59:13.4775945Z I1204 09:46:06.985000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 64237
2025-12-04T09:59:13.4776659Z I1204 09:46:06.986000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 64238
2025-12-04T09:59:13.4777322Z I1204 09:46:06.986000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 64239
2025-12-04T09:59:13.4779362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4779471Z   _warn_cpu_init()
2025-12-04T09:59:13.4781514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4781611Z   _warn_cpu_init()
2025-12-04T09:59:13.4783620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4783718Z   _warn_cpu_init()
2025-12-04T09:59:13.4784793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4784888Z   _init_core_state(
2025-12-04T09:59:13.4786616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4786780Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4787813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4787945Z   _init_core_state(
2025-12-04T09:59:13.4789683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4789834Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4791645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4791735Z   _warn_cpu_init()
2025-12-04T09:59:13.4792643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4792731Z   _init_core_state(
2025-12-04T09:59:13.4794241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4794387Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4795327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T09:59:13.4795411Z   _init_core_state(
2025-12-04T09:59:13.4797148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4797303Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4798945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4799106Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4800040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4800146Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4801747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4801932Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4803534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.4803690Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.4804410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4804539Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4805258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4805361Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4806081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4806178Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4806887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.4806996Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4807881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4807991Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4808725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4808829Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4809597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4809700Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4810436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.4810536Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4810983Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4811509Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4812507Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4813004Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4813962Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4814345Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4815280Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4815782Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4816964Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4817451Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4818518Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4818964Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4819924Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4820418Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4822442Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168.
2025-12-04T09:59:13.4822825Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4823485Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4824824Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4825188Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4825907Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4826451Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4826907Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4827479Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4828482Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4828994Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4829985Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4830416Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4831381Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4831863Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4832935Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4833404Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4834260Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4834651Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4835507Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4835947Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4837539Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4837898Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4838477Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4839789Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4840130Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4840809Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4841347Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4841771Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4842271Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4843212Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4843689Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4844649Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4845019Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4845928Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4846559Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4847524Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4847996Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4848925Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4849352Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4850283Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4850766Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4852708Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4853038Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4853622Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4854745Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4855090Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4855732Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4856211Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4856847Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4857393Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4858393Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4858948Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4859933Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4860329Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4861323Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4861811Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4862784Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4863266Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4864233Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4864677Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4865632Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4866163Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4867966Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 10404954112.
2025-12-04T09:59:13.4868336Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4869083Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4870312Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4870653Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4871328Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4871839Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4871933Z dist init r=0, world=4
2025-12-04T09:59:13.4872028Z dist init r=3, world=4
2025-12-04T09:59:13.4872144Z dist init r=2, world=4
2025-12-04T09:59:13.4872233Z dist init r=1, world=4
2025-12-04T09:59:13.4873324Z [rank0]:[W1204 09:46:30.063605831 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4874401Z [rank3]:[W1204 09:46:30.067176115 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4875478Z [rank2]:[W1204 09:46:30.067701587 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4876569Z [rank1]:[W1204 09:46:30.073377698 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4876670Z FAILED [41.7880s] [100%]
2025-12-04T09:59:13.4876677Z 
2025-12-04T09:59:13.4876815Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4877315Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _
2025-12-04T09:59:13.4877425Z Traceback (most recent call last):
2025-12-04T09:59:13.4877906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4878012Z     self._join_processes(fn)
2025-12-04T09:59:13.4878524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4878647Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4879191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4879290Z     raise RuntimeError(error)
2025-12-04T09:59:13.4879886Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4879993Z Traceback (most recent call last):
2025-12-04T09:59:13.4880474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4880580Z     getattr(self, test_name)()
2025-12-04T09:59:13.4881050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4881131Z     fn()
2025-12-04T09:59:13.4881589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4881686Z     method(*args, **kwargs)
2025-12-04T09:59:13.4882169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4882258Z     method(*args, **kwargs)
2025-12-04T09:59:13.4882706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4882795Z     with policy():
2025-12-04T09:59:13.4883242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4883338Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4884536Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168.
2025-12-04T09:59:13.4884573Z 
2025-12-04T09:59:13.4884763Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4885487Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4885493Z 
2025-12-04T09:59:13.4885726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4885731Z 
2025-12-04T09:59:13.4885878Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4885979Z Traceback (most recent call last):
2025-12-04T09:59:13.4886460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4886591Z     getattr(self, test_name)()
2025-12-04T09:59:13.4887058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4887135Z     fn()
2025-12-04T09:59:13.4887586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4887678Z     method(*args, **kwargs)
2025-12-04T09:59:13.4888129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4888217Z     method(*args, **kwargs)
2025-12-04T09:59:13.4888662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4888751Z     with policy():
2025-12-04T09:59:13.4889200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4889293Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4890483Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4890492Z 
2025-12-04T09:59:13.4890706Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4891431Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4891436Z 
2025-12-04T09:59:13.4891667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4891672Z 
2025-12-04T09:59:13.4891818Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4891922Z Traceback (most recent call last):
2025-12-04T09:59:13.4892402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4892506Z     getattr(self, test_name)()
2025-12-04T09:59:13.4892977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4893084Z     fn()
2025-12-04T09:59:13.4893532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4893620Z     method(*args, **kwargs)
2025-12-04T09:59:13.4894066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4894156Z     method(*args, **kwargs)
2025-12-04T09:59:13.4894598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4894694Z     with policy():
2025-12-04T09:59:13.4895142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4895267Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4896534Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4896541Z 
2025-12-04T09:59:13.4896905Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4897729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4897788Z 
2025-12-04T09:59:13.4898052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4898058Z 
2025-12-04T09:59:13.4898063Z 
2025-12-04T09:59:13.4898287Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.4898550Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.4899356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml -
2025-12-04T09:59:13.4899526Z =========================== short test summary info ============================
2025-12-04T09:59:13.4900487Z FAILED [41.7880s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.4900612Z Traceback (most recent call last):
2025-12-04T09:59:13.4901168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4901285Z     getattr(self, test_name)()
2025-12-04T09:59:13.4901818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4901904Z     fn()
2025-12-04T09:59:13.4902415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4902544Z     method(*args, **kwargs)
2025-12-04T09:59:13.4903046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4903150Z     method(*args, **kwargs)
2025-12-04T09:59:13.4903651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4903755Z     with policy():
2025-12-04T09:59:13.4904260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4904364Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4905748Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168.
2025-12-04T09:59:13.4905757Z 
2025-12-04T09:59:13.4905969Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4906784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4906789Z 
2025-12-04T09:59:13.4907050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4907058Z 
2025-12-04T09:59:13.4907214Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.4907336Z Traceback (most recent call last):
2025-12-04T09:59:13.4907878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4908023Z     getattr(self, test_name)()
2025-12-04T09:59:13.4908552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4908639Z     fn()
2025-12-04T09:59:13.4909249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4909345Z     method(*args, **kwargs)
2025-12-04T09:59:13.4909821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4909913Z     method(*args, **kwargs)
2025-12-04T09:59:13.4910416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4910511Z     with policy():
2025-12-04T09:59:13.4911084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4911182Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4912373Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112.
2025-12-04T09:59:13.4912378Z 
2025-12-04T09:59:13.4912562Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4913282Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4913288Z 
2025-12-04T09:59:13.4913519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4913523Z 
2025-12-04T09:59:13.4913670Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4913772Z Traceback (most recent call last):
2025-12-04T09:59:13.4914262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4914392Z     getattr(self, test_name)()
2025-12-04T09:59:13.4914866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4914945Z     fn()
2025-12-04T09:59:13.4915394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4915486Z     method(*args, **kwargs)
2025-12-04T09:59:13.4915938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4916025Z     method(*args, **kwargs)
2025-12-04T09:59:13.4916466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4916557Z     with policy():
2025-12-04T09:59:13.4917033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4917128Z     raise RuntimeError(msg)
2025-12-04T09:59:13.4918313Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112.
2025-12-04T09:59:13.4918318Z 
2025-12-04T09:59:13.4918505Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4919226Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4919257Z 
2025-12-04T09:59:13.4919490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4919653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.4919811Z ====================== 1 failed, 26 deselected in 42.01s =======================
2025-12-04T09:59:13.4919892Z Got exit code 1
2025-12-04T09:59:13.4920543Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda
2025-12-04T09:59:13.4921046Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.4921870Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml
2025-12-04T09:59:13.4922033Z ============================= test session starts ==============================
2025-12-04T09:59:13.4922382Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.4922493Z cachedir: .pytest_cache
2025-12-04T09:59:13.4923006Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.4923123Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.4923232Z configfile: pytest.ini
2025-12-04T09:59:13.4923764Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.4923985Z collecting ... collected 60 items / 11 deselected / 49 selected
2025-12-04T09:59:13.4924121Z stepcurrent: skipping 11 already run items.
2025-12-04T09:59:13.4924231Z Running 16 items in this shard
2025-12-04T09:59:13.4924236Z 
2025-12-04T09:59:13.4925301Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:46:53.034000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 65485
2025-12-04T09:59:13.4925800Z I1204 09:46:53.035000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 65486
2025-12-04T09:59:13.4926352Z I1204 09:46:53.036000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 65487
2025-12-04T09:59:13.4926841Z I1204 09:46:53.036000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 65488
2025-12-04T09:59:13.4928880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4928984Z   _warn_cpu_init()
2025-12-04T09:59:13.4931025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4931127Z   _warn_cpu_init()
2025-12-04T09:59:13.4933126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4933265Z   _warn_cpu_init()
2025-12-04T09:59:13.4935305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.4935396Z   _warn_cpu_init()
2025-12-04T09:59:13.4936396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.4936509Z   return func(*args, **kwargs)
2025-12-04T09:59:13.4937117Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4937659Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4938659Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4939166Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4940160Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4940555Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4941553Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4942047Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4943001Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4943494Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4944453Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4944931Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4945898Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4946385Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4948068Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 707723264 and is now 758054912.
2025-12-04T09:59:13.4948476Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4949238Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4950324Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.4950705Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4951374Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4951887Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.4952316Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4952811Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4953765Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4954246Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4955269Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4955653Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4956502Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4956933Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4957789Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4958229Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4959097Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4959495Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4960347Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4960779Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4962301Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.4962619Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4963204Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4964248Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.4964579Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4965214Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4965696Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.4966096Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4966559Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4967446Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4967895Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4968799Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4969147Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4969992Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4970426Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4971301Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4971735Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4972583Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4972978Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4973833Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4974291Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4975779Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.4976096Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4976975Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4978134Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.4978505Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4979217Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4979758Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.4980205Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.4980732Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.4981748Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4982280Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.4983270Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4983664Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.4984620Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4985112Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4986096Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.4986585Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.4987545Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.4987996Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.4989090Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.4989530Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.4991027Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.4991377Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4991964Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.4992984Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.4993306Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.4993936Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.4994419Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.4994511Z dist init r=2, world=4
2025-12-04T09:59:13.4994598Z dist init r=3, world=4
2025-12-04T09:59:13.4994688Z dist init r=1, world=4
2025-12-04T09:59:13.4994771Z dist init r=0, world=4
2025-12-04T09:59:13.4995821Z [rank0]:[W1204 09:47:01.469977700 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.4995913Z FAILED [10.7961s] [  6%]
2025-12-04T09:59:13.4995918Z 
2025-12-04T09:59:13.4996044Z =================================== FAILURES ===================================
2025-12-04T09:59:13.4996325Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __
2025-12-04T09:59:13.4996432Z Traceback (most recent call last):
2025-12-04T09:59:13.4996912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.4997021Z     self._join_processes(fn)
2025-12-04T09:59:13.4997537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.4997699Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.4998243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.4998342Z     raise RuntimeError(error)
2025-12-04T09:59:13.4998554Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.4998658Z Traceback (most recent call last):
2025-12-04T09:59:13.4999133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.4999239Z     getattr(self, test_name)()
2025-12-04T09:59:13.4999712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.4999817Z     fn()
2025-12-04T09:59:13.5000269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5000360Z     method(*args, **kwargs)
2025-12-04T09:59:13.5000814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5000908Z     method(*args, **kwargs)
2025-12-04T09:59:13.5001352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5001443Z     with policy():
2025-12-04T09:59:13.5001896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5002018Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5003119Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.5003126Z 
2025-12-04T09:59:13.5003319Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5003948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5003954Z 
2025-12-04T09:59:13.5004186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5004191Z 
2025-12-04T09:59:13.5004195Z 
2025-12-04T09:59:13.5004389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5004622Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5005334Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml -
2025-12-04T09:59:13.5005492Z =========================== short test summary info ============================
2025-12-04T09:59:13.5006276Z FAILED [10.7961s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.5006392Z Traceback (most recent call last):
2025-12-04T09:59:13.5006876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5006973Z     getattr(self, test_name)()
2025-12-04T09:59:13.5007458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5007538Z     fn()
2025-12-04T09:59:13.5007993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5008087Z     method(*args, **kwargs)
2025-12-04T09:59:13.5008534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5008662Z     method(*args, **kwargs)
2025-12-04T09:59:13.5009111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5009193Z     with policy():
2025-12-04T09:59:13.5009645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5009742Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5010841Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.5010874Z 
2025-12-04T09:59:13.5011063Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5011684Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5011696Z 
2025-12-04T09:59:13.5011929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5012085Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5012248Z ====================== 1 failed, 11 deselected in 11.01s =======================
2025-12-04T09:59:13.5012332Z Got exit code 1
2025-12-04T09:59:13.5012424Z Retrying single test...
2025-12-04T09:59:13.5013009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml
2025-12-04T09:59:13.5013150Z ============================= test session starts ==============================
2025-12-04T09:59:13.5013461Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5013555Z cachedir: .pytest_cache
2025-12-04T09:59:13.5014012Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5014127Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5014219Z configfile: pytest.ini
2025-12-04T09:59:13.5014694Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5014894Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.5015581Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5015690Z Running 1 items in this shard
2025-12-04T09:59:13.5015695Z 
2025-12-04T09:59:13.5016888Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:47:08.523000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 65822
2025-12-04T09:59:13.5017427Z I1204 09:47:08.524000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 65823
2025-12-04T09:59:13.5017927Z I1204 09:47:08.525000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 65824
2025-12-04T09:59:13.5018413Z I1204 09:47:08.526000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 65825
2025-12-04T09:59:13.5020447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5020577Z   _warn_cpu_init()
2025-12-04T09:59:13.5022792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5022893Z   _warn_cpu_init()
2025-12-04T09:59:13.5024899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5025129Z   _warn_cpu_init()
2025-12-04T09:59:13.5027142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5027277Z   _warn_cpu_init()
2025-12-04T09:59:13.5028275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5028394Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5028851Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5029391Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5030389Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5030897Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5031890Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5032287Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5033379Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5033841Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5034743Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5035198Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5036131Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5036555Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5037457Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5037921Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5039676Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.5040070Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5040706Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5041832Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5042208Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5042904Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5043447Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5043881Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5044399Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5045365Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5045853Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5046861Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5047246Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5048178Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5048649Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5049591Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5050085Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5051011Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5051445Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5052375Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5052861Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5054507Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.5054868Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5055497Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5056728Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5057266Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5057988Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5058542Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5058992Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5059532Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5060531Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5061034Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5062060Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5062457Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5063420Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5063908Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5064917Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5065403Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5066360Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5066812Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5067776Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5068419Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5070078Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.5070440Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5071020Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5072052Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5072370Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5073002Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5073488Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5073888Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5074365Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5075276Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5075725Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5076600Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5076948Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5077808Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5078269Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5079131Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5079557Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5080403Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5080834Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5081686Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5082125Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5083605Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.5083964Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5084546Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5085560Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5085886Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5086515Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5087010Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5087100Z dist init r=1, world=4
2025-12-04T09:59:13.5087185Z dist init r=3, world=4
2025-12-04T09:59:13.5087274Z dist init r=2, world=4
2025-12-04T09:59:13.5087361Z dist init r=0, world=4
2025-12-04T09:59:13.5088412Z [rank0]:[W1204 09:47:17.988381341 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5088501Z FAILED [10.8152s] [100%]
2025-12-04T09:59:13.5088506Z 
2025-12-04T09:59:13.5088636Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5088925Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __
2025-12-04T09:59:13.5089030Z Traceback (most recent call last):
2025-12-04T09:59:13.5089519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5089618Z     self._join_processes(fn)
2025-12-04T09:59:13.5090163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5090292Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5090827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5090925Z     raise RuntimeError(error)
2025-12-04T09:59:13.5091135Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5091239Z Traceback (most recent call last):
2025-12-04T09:59:13.5091723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5091819Z     getattr(self, test_name)()
2025-12-04T09:59:13.5092290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5092409Z     fn()
2025-12-04T09:59:13.5092857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5092949Z     method(*args, **kwargs)
2025-12-04T09:59:13.5093405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5093500Z     method(*args, **kwargs)
2025-12-04T09:59:13.5093951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5094036Z     with policy():
2025-12-04T09:59:13.5094527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5094632Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5095717Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.5095725Z 
2025-12-04T09:59:13.5095921Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5096615Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5096622Z 
2025-12-04T09:59:13.5097048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5097063Z 
2025-12-04T09:59:13.5097071Z 
2025-12-04T09:59:13.5097289Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5097594Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5098399Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml -
2025-12-04T09:59:13.5098570Z =========================== short test summary info ============================
2025-12-04T09:59:13.5099475Z FAILED [10.8152s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5099596Z Traceback (most recent call last):
2025-12-04T09:59:13.5100147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5100267Z     getattr(self, test_name)()
2025-12-04T09:59:13.5100803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5100892Z     fn()
2025-12-04T09:59:13.5101404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5101507Z     method(*args, **kwargs)
2025-12-04T09:59:13.5102045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5102152Z     method(*args, **kwargs)
2025-12-04T09:59:13.5102652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5102755Z     with policy():
2025-12-04T09:59:13.5103261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5103366Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5104600Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.5104634Z 
2025-12-04T09:59:13.5104846Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5105553Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5105559Z 
2025-12-04T09:59:13.5105818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5106004Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5106182Z ====================== 1 failed, 26 deselected in 11.03s =======================
2025-12-04T09:59:13.5106307Z Got exit code 1
2025-12-04T09:59:13.5106416Z Retrying single test...
2025-12-04T09:59:13.5107038Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml
2025-12-04T09:59:13.5107194Z ============================= test session starts ==============================
2025-12-04T09:59:13.5107549Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5107657Z cachedir: .pytest_cache
2025-12-04T09:59:13.5108178Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5108299Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5108401Z configfile: pytest.ini
2025-12-04T09:59:13.5109155Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5109352Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.5110047Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5110146Z Running 1 items in this shard
2025-12-04T09:59:13.5110151Z 
2025-12-04T09:59:13.5111123Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:47:23.993000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66159
2025-12-04T09:59:13.5111572Z I1204 09:47:23.994000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66160
2025-12-04T09:59:13.5112005Z I1204 09:47:23.995000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66161
2025-12-04T09:59:13.5112440Z I1204 09:47:23.996000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66162
2025-12-04T09:59:13.5114274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5114371Z   _warn_cpu_init()
2025-12-04T09:59:13.5116158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5116249Z   _warn_cpu_init()
2025-12-04T09:59:13.5118033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5118143Z   _warn_cpu_init()
2025-12-04T09:59:13.5119929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5120042Z   _warn_cpu_init()
2025-12-04T09:59:13.5121090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5121371Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5121843Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5122374Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5123379Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5123893Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5124881Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5125343Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5126298Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5126785Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5127754Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5128241Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5129241Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5129686Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5130651Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5131141Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5132857Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.5133222Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5133951Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5135014Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5135338Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5135981Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5136527Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5137130Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5137666Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5138669Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5139183Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5140212Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5140614Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5141568Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5142056Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5143046Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5143540Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5144502Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5144945Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5145918Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5146449Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5148135Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.5148505Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5149287Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5150422Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5150779Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5151478Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5152005Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5152438Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5152957Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5153930Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5154456Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5155414Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5155806Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5156847Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5157308Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5158348Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5158777Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5159635Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5160029Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5160915Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5161348Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5162825Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.5163188Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5163770Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5164797Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5165118Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5165753Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5166235Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5166633Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5167109Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5168026Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5168485Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5169356Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5169715Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5170602Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5171034Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5171886Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5172314Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5173168Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5173591Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5174450Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5174882Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5176438Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.5177001Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5177668Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5178825Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5179181Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5179907Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5180456Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5180554Z dist init r=2, world=4
2025-12-04T09:59:13.5180662Z dist init r=0, world=4
2025-12-04T09:59:13.5180757Z dist init r=3, world=4
2025-12-04T09:59:13.5180883Z dist init r=1, world=4
2025-12-04T09:59:13.5182044Z [rank0]:[W1204 09:47:32.610414954 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5182142Z FAILED [10.4840s] [100%]
2025-12-04T09:59:13.5182148Z 
2025-12-04T09:59:13.5182304Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5182617Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __
2025-12-04T09:59:13.5182734Z Traceback (most recent call last):
2025-12-04T09:59:13.5183288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5183399Z     self._join_processes(fn)
2025-12-04T09:59:13.5184016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5184159Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5184761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5184881Z     raise RuntimeError(error)
2025-12-04T09:59:13.5185111Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.5185234Z Traceback (most recent call last):
2025-12-04T09:59:13.5185771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5185909Z     getattr(self, test_name)()
2025-12-04T09:59:13.5186447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5186536Z     fn()
2025-12-04T09:59:13.5187044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5187151Z     method(*args, **kwargs)
2025-12-04T09:59:13.5187652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5187757Z     method(*args, **kwargs)
2025-12-04T09:59:13.5188262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5188390Z     with policy():
2025-12-04T09:59:13.5189009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5189114Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5190206Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.5190218Z 
2025-12-04T09:59:13.5190406Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5191022Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5191028Z 
2025-12-04T09:59:13.5191266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5191273Z 
2025-12-04T09:59:13.5191277Z 
2025-12-04T09:59:13.5191473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5191710Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5192422Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml -
2025-12-04T09:59:13.5192602Z =========================== short test summary info ============================
2025-12-04T09:59:13.5193374Z FAILED [10.4840s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.5193478Z Traceback (most recent call last):
2025-12-04T09:59:13.5193974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5194072Z     getattr(self, test_name)()
2025-12-04T09:59:13.5194541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5194629Z     fn()
2025-12-04T09:59:13.5195074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5195209Z     method(*args, **kwargs)
2025-12-04T09:59:13.5195654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5195746Z     method(*args, **kwargs)
2025-12-04T09:59:13.5196196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5196278Z     with policy():
2025-12-04T09:59:13.5196727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5196828Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5197916Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.5197946Z 
2025-12-04T09:59:13.5198142Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5198764Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5198768Z 
2025-12-04T09:59:13.5199003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5199160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5199340Z ====================== 1 failed, 26 deselected in 10.70s =======================
2025-12-04T09:59:13.5199426Z Got exit code 1
2025-12-04T09:59:13.5199971Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda
2025-12-04T09:59:13.5200331Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.5200889Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml
2025-12-04T09:59:13.5201029Z ============================= test session starts ==============================
2025-12-04T09:59:13.5201341Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5201433Z cachedir: .pytest_cache
2025-12-04T09:59:13.5201884Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5201997Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5202089Z configfile: pytest.ini
2025-12-04T09:59:13.5202561Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5202757Z collecting ... collected 60 items / 12 deselected / 48 selected
2025-12-04T09:59:13.5202880Z stepcurrent: skipping 12 already run items.
2025-12-04T09:59:13.5202986Z Running 15 items in this shard
2025-12-04T09:59:13.5202991Z 
2025-12-04T09:59:13.5203962Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:47:39.283000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66496
2025-12-04T09:59:13.5204406Z I1204 09:47:39.284000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66497
2025-12-04T09:59:13.5204847Z I1204 09:47:39.285000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66498
2025-12-04T09:59:13.5205280Z I1204 09:47:39.286000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66499
2025-12-04T09:59:13.5206179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5206289Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5207155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5207260Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5208110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5208225Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5209074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5209214Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5211006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5211409Z   _warn_cpu_init()
2025-12-04T09:59:13.5213208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5213298Z   _warn_cpu_init()
2025-12-04T09:59:13.5215069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5215156Z   _warn_cpu_init()
2025-12-04T09:59:13.5217277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5217375Z   _warn_cpu_init()
2025-12-04T09:59:13.5218378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5218638Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5219623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5219887Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5221153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5221427Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5222411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5222673Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5223666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5223814Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5224601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5224709Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5225481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5225586Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5226342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5226497Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5227252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5227366Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5228122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5228225Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5228990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5229093Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5229844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5229955Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5230710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5230822Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5231317Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5231850Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5233060Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5233521Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5234406Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5234855Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5235710Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5236143Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5236993Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5237458Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5238311Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5238906Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5239816Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5241401Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5243615Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064.
2025-12-04T09:59:13.5245689Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5246772Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5248654Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5250218Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5251567Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5252964Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5254061Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5255141Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5257017Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5258663Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5260339Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5261861Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5263352Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5264990Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5271087Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5272618Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5274126Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5275610Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5277063Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5278483Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5280542Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.5282491Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5283513Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5285249Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5286748Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5287830Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5289058Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5290054Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5291042Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5292559Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5294013Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5295449Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5297105Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5298608Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5300238Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5301818Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5303394Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5304967Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5306537Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5308084Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5309700Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5311741Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.5313673Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5314697Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5316473Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5317942Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5319003Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5320244Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5321594Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5322790Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5324460Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5326093Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5327717Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5329235Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5330776Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5332360Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5333990Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5335396Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5337108Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5338660Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5340202Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5341789Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5344110Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.5346293Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5347502Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5349503Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5350974Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5352051Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5353290Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5353980Z dist init r=0, world=4
2025-12-04T09:59:13.5354263Z dist init r=3, world=4
2025-12-04T09:59:13.5354500Z dist init r=1, world=4
2025-12-04T09:59:13.5354737Z dist init r=2, world=4
2025-12-04T09:59:13.5355906Z [rank0]:[W1204 09:47:48.718992126 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5357129Z FAILED [10.7200s] [  6%]
2025-12-04T09:59:13.5357294Z 
2025-12-04T09:59:13.5357422Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5357965Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _
2025-12-04T09:59:13.5358465Z Traceback (most recent call last):
2025-12-04T09:59:13.5359178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5359876Z     self._join_processes(fn)
2025-12-04T09:59:13.5360575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5361329Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5362100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5362859Z     raise RuntimeError(error)
2025-12-04T09:59:13.5363243Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5363699Z Traceback (most recent call last):
2025-12-04T09:59:13.5364383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5365243Z     getattr(self, test_name)()
2025-12-04T09:59:13.5365934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5366653Z     fn()
2025-12-04T09:59:13.5367248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5367937Z     method(*args, **kwargs)
2025-12-04T09:59:13.5368595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5369290Z     method(*args, **kwargs)
2025-12-04T09:59:13.5369942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5370625Z     with policy():
2025-12-04T09:59:13.5371259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5371971Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5373546Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.5374875Z 
2025-12-04T09:59:13.5375081Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5376098Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5377004Z 
2025-12-04T09:59:13.5377443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5377848Z 
2025-12-04T09:59:13.5377853Z 
2025-12-04T09:59:13.5378086Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5378703Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5379934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml -
2025-12-04T09:59:13.5381042Z =========================== short test summary info ============================
2025-12-04T09:59:13.5382227Z FAILED [10.7200s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5383324Z Traceback (most recent call last):
2025-12-04T09:59:13.5384103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5384905Z     getattr(self, test_name)()
2025-12-04T09:59:13.5385649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5386431Z     fn()
2025-12-04T09:59:13.5387062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5387807Z     method(*args, **kwargs)
2025-12-04T09:59:13.5388605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5389331Z     method(*args, **kwargs)
2025-12-04T09:59:13.5390101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5390794Z     with policy():
2025-12-04T09:59:13.5391410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5392149Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5393496Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.5394783Z 
2025-12-04T09:59:13.5394990Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5395975Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5396958Z 
2025-12-04T09:59:13.5397215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5397774Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5398249Z ====================== 1 failed, 12 deselected in 10.94s =======================
2025-12-04T09:59:13.5398638Z Got exit code 1
2025-12-04T09:59:13.5398880Z Retrying single test...
2025-12-04T09:59:13.5399655Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml
2025-12-04T09:59:13.5400540Z ============================= test session starts ==============================
2025-12-04T09:59:13.5401187Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5401750Z cachedir: .pytest_cache
2025-12-04T09:59:13.5402541Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5403257Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5403577Z configfile: pytest.ini
2025-12-04T09:59:13.5404245Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5405076Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.5406131Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5407091Z Running 1 items in this shard
2025-12-04T09:59:13.5407283Z 
2025-12-04T09:59:13.5408438Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:47:54.704000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66833
2025-12-04T09:59:13.5409937Z I1204 09:47:54.704000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66834
2025-12-04T09:59:13.5410925Z I1204 09:47:54.705000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66835
2025-12-04T09:59:13.5411921Z I1204 09:47:54.706000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66836
2025-12-04T09:59:13.5413345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5414470Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5416557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5418988Z   _warn_cpu_init()
2025-12-04T09:59:13.5420119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5421598Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5422824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5424033Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5426274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5428508Z   _warn_cpu_init()
2025-12-04T09:59:13.5430749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5433173Z   _warn_cpu_init()
2025-12-04T09:59:13.5434189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5435409Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5436882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5438038Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5439186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5440329Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5441479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5442772Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5444070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5445394Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5447852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5450012Z   _warn_cpu_init()
2025-12-04T09:59:13.5451117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5452577Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5453669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5454616Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5455526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5456540Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5457663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5458670Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5459635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5460633Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5461584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5462629Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5463581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5464576Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5465523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5466520Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5467474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5468471Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5469345Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5470348Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5471826Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5473282Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5474731Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5476113Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5477442Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5478856Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5480257Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5481679Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5483087Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5484458Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5485814Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5487219Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5489269Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.5491227Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5492252Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5493975Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5495448Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5496600Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5498197Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5499327Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5500436Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5502098Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5503744Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5505411Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5506923Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5508415Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5510078Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5511609Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5513007Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5514413Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5515770Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5517139Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5518547Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5520624Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.5523070Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5524221Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5526178Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5527895Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5529105Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5530486Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5531606Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5532719Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5534390Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5535882Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5537652Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5539169Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5540724Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5542306Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5543883Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5545455Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5547031Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5548567Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5550087Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5551491Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5553570Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 1. CUDA driver allocated memory was 598671360 and is now 651100160.
2025-12-04T09:59:13.5555506Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5556540Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5558297Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5559761Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5560828Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5562060Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5563057Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5564077Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5565552Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5567002Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5568443Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5569828Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5571157Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5572556Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5573961Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5575355Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5577017Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5578561Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5580141Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5581735Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5584058Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.5586237Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5587417Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5589524Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5590989Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5592062Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5593291Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5594008Z dist init r=1, world=4
2025-12-04T09:59:13.5594245Z dist init r=3, world=4
2025-12-04T09:59:13.5594475Z dist init r=0, world=4
2025-12-04T09:59:13.5594701Z dist init r=2, world=4
2025-12-04T09:59:13.5595880Z [rank0]:[W1204 09:48:03.083584524 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5597098Z FAILED [10.0360s] [100%]
2025-12-04T09:59:13.5597258Z 
2025-12-04T09:59:13.5597394Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5597959Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _
2025-12-04T09:59:13.5598467Z Traceback (most recent call last):
2025-12-04T09:59:13.5599147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5599843Z     self._join_processes(fn)
2025-12-04T09:59:13.5600538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5601294Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5602061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5602812Z     raise RuntimeError(error)
2025-12-04T09:59:13.5603191Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.5603612Z Traceback (most recent call last):
2025-12-04T09:59:13.5604296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5604981Z     getattr(self, test_name)()
2025-12-04T09:59:13.5605636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5606309Z     fn()
2025-12-04T09:59:13.5606878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5607560Z     method(*args, **kwargs)
2025-12-04T09:59:13.5608178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5608837Z     method(*args, **kwargs)
2025-12-04T09:59:13.5609445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5610098Z     with policy():
2025-12-04T09:59:13.5610695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5611366Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5612664Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.5613880Z 
2025-12-04T09:59:13.5614068Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5614993Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5615735Z 
2025-12-04T09:59:13.5615972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5616390Z 
2025-12-04T09:59:13.5616395Z 
2025-12-04T09:59:13.5616599Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5617372Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5618607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml -
2025-12-04T09:59:13.5619704Z =========================== short test summary info ============================
2025-12-04T09:59:13.5621081Z FAILED [10.0360s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.5622199Z Traceback (most recent call last):
2025-12-04T09:59:13.5622986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5623839Z     getattr(self, test_name)()
2025-12-04T09:59:13.5624571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5625323Z     fn()
2025-12-04T09:59:13.5625956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5626697Z     method(*args, **kwargs)
2025-12-04T09:59:13.5627390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5628139Z     method(*args, **kwargs)
2025-12-04T09:59:13.5628826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5629556Z     with policy():
2025-12-04T09:59:13.5630221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5630972Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5632406Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.5633870Z 
2025-12-04T09:59:13.5634062Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5635030Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5635779Z 
2025-12-04T09:59:13.5636011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5636515Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5636941Z ====================== 1 failed, 26 deselected in 10.25s =======================
2025-12-04T09:59:13.5637304Z Got exit code 1
2025-12-04T09:59:13.5637530Z Retrying single test...
2025-12-04T09:59:13.5638234Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml
2025-12-04T09:59:13.5639032Z ============================= test session starts ==============================
2025-12-04T09:59:13.5639632Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5640150Z cachedir: .pytest_cache
2025-12-04T09:59:13.5640752Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5641429Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5641723Z configfile: pytest.ini
2025-12-04T09:59:13.5642358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5643137Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.5644146Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5645083Z Running 1 items in this shard
2025-12-04T09:59:13.5645267Z 
2025-12-04T09:59:13.5646221Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:48:09.714000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67170
2025-12-04T09:59:13.5647709Z I1204 09:48:09.715000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67171
2025-12-04T09:59:13.5648702Z I1204 09:48:09.715000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67172
2025-12-04T09:59:13.5649687Z I1204 09:48:09.716000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67173
2025-12-04T09:59:13.5651150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5652232Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5653306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5654391Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5655455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5656606Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5657978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5659196Z   return wrapper_cls(module, **kwargs)
2025-12-04T09:59:13.5661514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5663747Z   _warn_cpu_init()
2025-12-04T09:59:13.5665899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5668126Z   _warn_cpu_init()
2025-12-04T09:59:13.5670277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5672260Z   _warn_cpu_init()
2025-12-04T09:59:13.5674181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5676278Z   _warn_cpu_init()
2025-12-04T09:59:13.5677288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5678509Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5679735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5680979Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5682198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5683409Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5684632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.5686061Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T09:59:13.5687366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5688518Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5689434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5690384Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5691324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5692268Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5693171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5694308Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5695240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.5696199Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5697375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5698410Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5699377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5700373Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5701329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5702322Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5703279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.5704296Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5704956Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5706076Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5707744Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5709464Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5711200Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5712776Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5714182Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5715673Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5717156Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5718645Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5720135Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5721991Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5723534Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5725121Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5727441Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 707723264 and is now 760152064.
2025-12-04T09:59:13.5729658Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5730816Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5732767Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5734523Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5735599Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5737142Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5738282Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5739387Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5741053Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5742731Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5744361Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5745875Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5747359Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5749045Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5750454Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5751850Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5753286Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5754647Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5756010Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5757414Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5759497Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.5761430Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5762449Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5764190Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5765690Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5766763Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5767999Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5769001Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5769991Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5771500Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5772952Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5774388Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5775729Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5777376Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5778959Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5780548Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5782159Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5783743Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5785280Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5786821Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5788403Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5790663Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.5792601Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5793630Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5795357Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5796857Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5797926Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5799163Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5800193Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5801179Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5802658Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5804112Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5805549Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5806896Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5808223Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5809625Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5811051Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5812448Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5813855Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5815210Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5816850Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5818454Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5820934Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.5823154Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5824377Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5826322Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5827970Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5829184Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5830617Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5831393Z dist init r=2, world=4
2025-12-04T09:59:13.5831662Z dist init r=3, world=4
2025-12-04T09:59:13.5831924Z dist init r=1, world=4
2025-12-04T09:59:13.5832188Z dist init r=0, world=4
2025-12-04T09:59:13.5833584Z [rank0]:[W1204 09:48:18.075801567 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5834895Z FAILED [10.0962s] [100%]
2025-12-04T09:59:13.5835065Z 
2025-12-04T09:59:13.5835210Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5835787Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _
2025-12-04T09:59:13.5836327Z Traceback (most recent call last):
2025-12-04T09:59:13.5837048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5837789Z     self._join_processes(fn)
2025-12-04T09:59:13.5838520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5839330Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5840376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5841201Z     raise RuntimeError(error)
2025-12-04T09:59:13.5841720Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.5842168Z Traceback (most recent call last):
2025-12-04T09:59:13.5842888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5843628Z     getattr(self, test_name)()
2025-12-04T09:59:13.5844324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5845042Z     fn()
2025-12-04T09:59:13.5845639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5846371Z     method(*args, **kwargs)
2025-12-04T09:59:13.5847033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5847735Z     method(*args, **kwargs)
2025-12-04T09:59:13.5848383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5849259Z     with policy():
2025-12-04T09:59:13.5849905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5850672Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5852272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.5852313Z 
2025-12-04T09:59:13.5852518Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5853197Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5853202Z 
2025-12-04T09:59:13.5853447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5853452Z 
2025-12-04T09:59:13.5853456Z 
2025-12-04T09:59:13.5853673Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5853962Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5854725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml -
2025-12-04T09:59:13.5854887Z =========================== short test summary info ============================
2025-12-04T09:59:13.5855889Z FAILED [10.0962s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.5856014Z Traceback (most recent call last):
2025-12-04T09:59:13.5856630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5856740Z     getattr(self, test_name)()
2025-12-04T09:59:13.5857447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5857542Z     fn()
2025-12-04T09:59:13.5858050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5858153Z     method(*args, **kwargs)
2025-12-04T09:59:13.5858656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5858763Z     method(*args, **kwargs)
2025-12-04T09:59:13.5859299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5859399Z     with policy():
2025-12-04T09:59:13.5859905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5860009Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5861258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.5861269Z 
2025-12-04T09:59:13.5861483Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5862230Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5862236Z 
2025-12-04T09:59:13.5862496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5862674Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5862854Z ====================== 1 failed, 26 deselected in 10.31s =======================
2025-12-04T09:59:13.5862947Z Got exit code 1
2025-12-04T09:59:13.5863589Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda
2025-12-04T09:59:13.5863991Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.5864644Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml
2025-12-04T09:59:13.5864810Z ============================= test session starts ==============================
2025-12-04T09:59:13.5865160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5865263Z cachedir: .pytest_cache
2025-12-04T09:59:13.5865784Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5865902Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5866013Z configfile: pytest.ini
2025-12-04T09:59:13.5866575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5866789Z collecting ... collected 60 items / 13 deselected / 47 selected
2025-12-04T09:59:13.5866939Z stepcurrent: skipping 13 already run items.
2025-12-04T09:59:13.5867047Z Running 14 items in this shard
2025-12-04T09:59:13.5867052Z 
2025-12-04T09:59:13.5868123Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:24.703000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67507
2025-12-04T09:59:13.5868620Z I1204 09:48:24.704000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67508
2025-12-04T09:59:13.5869221Z I1204 09:48:24.705000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67509
2025-12-04T09:59:13.5869823Z I1204 09:48:24.706000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67510
2025-12-04T09:59:13.5871796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5871900Z   _warn_cpu_init()
2025-12-04T09:59:13.5874007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5874130Z   _warn_cpu_init()
2025-12-04T09:59:13.5876155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5876259Z   _warn_cpu_init()
2025-12-04T09:59:13.5878202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5878328Z   _warn_cpu_init()
2025-12-04T09:59:13.5879307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5879416Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5879878Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5880395Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5881373Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5881895Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5882859Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5883258Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5884189Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5884664Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5885593Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5886071Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5887018Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5887451Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5888388Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5888863Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5890523Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336.
2025-12-04T09:59:13.5890876Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5891522Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5892631Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5893432Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5894133Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5894660Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.5895099Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5895614Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5896706Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5897393Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5898380Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5898778Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5899763Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5900257Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5901220Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5901752Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5902709Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5903152Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5904113Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5904628Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5906306Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 709820416 and is now 734986240.
2025-12-04T09:59:13.5906671Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5907337Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5908515Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5908986Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5909687Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5910210Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5910674Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5911182Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5912159Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5912650Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5913609Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5913998Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5915039Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5915508Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5916439Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5916996Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5917849Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5918247Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5919143Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5919583Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5921387Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.5921760Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5922487Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5923633Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5923990Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5924715Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5925335Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.5925790Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5926323Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5927321Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5927838Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5928825Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5929231Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5930240Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5930738Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5931694Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5932183Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5933146Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5933732Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5934597Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5935028Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5936589Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.5937164Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5937836Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5938980Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5939340Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5940092Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5940641Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.5940753Z dist init r=1, world=4
2025-12-04T09:59:13.5940855Z dist init r=2, world=4
2025-12-04T09:59:13.5940950Z dist init r=3, world=4
2025-12-04T09:59:13.5941055Z dist init r=0, world=4
2025-12-04T09:59:13.5942215Z [rank0]:[W1204 09:48:34.770306676 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.5942324Z FAILED [11.0968s] [  7%]
2025-12-04T09:59:13.5942333Z 
2025-12-04T09:59:13.5942481Z =================================== FAILURES ===================================
2025-12-04T09:59:13.5942791Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __
2025-12-04T09:59:13.5942922Z Traceback (most recent call last):
2025-12-04T09:59:13.5943471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.5943582Z     self._join_processes(fn)
2025-12-04T09:59:13.5944211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.5944352Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.5944963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.5945077Z     raise RuntimeError(error)
2025-12-04T09:59:13.5945313Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5945438Z Traceback (most recent call last):
2025-12-04T09:59:13.5945978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5946091Z     getattr(self, test_name)()
2025-12-04T09:59:13.5946665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5946751Z     fn()
2025-12-04T09:59:13.5947269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5947370Z     method(*args, **kwargs)
2025-12-04T09:59:13.5947873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5947987Z     method(*args, **kwargs)
2025-12-04T09:59:13.5948489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5948698Z     with policy():
2025-12-04T09:59:13.5949315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5949451Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5950619Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336.
2025-12-04T09:59:13.5950625Z 
2025-12-04T09:59:13.5950824Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5951488Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5951520Z 
2025-12-04T09:59:13.5951768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5951773Z 
2025-12-04T09:59:13.5951778Z 
2025-12-04T09:59:13.5951980Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.5952235Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.5952994Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml -
2025-12-04T09:59:13.5953168Z =========================== short test summary info ============================
2025-12-04T09:59:13.5954154Z FAILED [11.0968s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.5954272Z Traceback (most recent call last):
2025-12-04T09:59:13.5954810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5954920Z     getattr(self, test_name)()
2025-12-04T09:59:13.5955449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5955535Z     fn()
2025-12-04T09:59:13.5956023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5956131Z     method(*args, **kwargs)
2025-12-04T09:59:13.5956650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5956749Z     method(*args, **kwargs)
2025-12-04T09:59:13.5957243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5957336Z     with policy():
2025-12-04T09:59:13.5957838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5957940Z     raise RuntimeError(msg)
2025-12-04T09:59:13.5959180Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336.
2025-12-04T09:59:13.5959189Z 
2025-12-04T09:59:13.5959407Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5960100Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5960106Z 
2025-12-04T09:59:13.5960367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5960540Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.5960714Z ====================== 1 failed, 13 deselected in 11.31s =======================
2025-12-04T09:59:13.5960812Z Got exit code 1
2025-12-04T09:59:13.5960912Z Retrying single test...
2025-12-04T09:59:13.5961557Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml
2025-12-04T09:59:13.5961711Z ============================= test session starts ==============================
2025-12-04T09:59:13.5962049Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.5962160Z cachedir: .pytest_cache
2025-12-04T09:59:13.5962657Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.5962770Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.5962880Z configfile: pytest.ini
2025-12-04T09:59:13.5963397Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.5963638Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.5964393Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5964503Z Running 1 items in this shard
2025-12-04T09:59:13.5964511Z 
2025-12-04T09:59:13.5965536Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:40.504000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67844
2025-12-04T09:59:13.5966017Z I1204 09:48:40.505000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67845
2025-12-04T09:59:13.5966510Z I1204 09:48:40.505000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67846
2025-12-04T09:59:13.5966989Z I1204 09:48:40.506000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67847
2025-12-04T09:59:13.5969103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5969204Z   _warn_cpu_init()
2025-12-04T09:59:13.5971102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5971199Z   _warn_cpu_init()
2025-12-04T09:59:13.5972352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.5972473Z   return func(*args, **kwargs)
2025-12-04T09:59:13.5974418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5974526Z   _warn_cpu_init()
2025-12-04T09:59:13.5976556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.5976889Z   _warn_cpu_init()
2025-12-04T09:59:13.5977354Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5977889Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5978933Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5979444Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5980450Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5980854Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5981831Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5982318Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5983280Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5983808Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5984769Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.5985229Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.5986197Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.5986697Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.5988478Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240.
2025-12-04T09:59:13.5988948Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5989575Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.5990655Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.5991039Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.5991716Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.5992238Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.5992659Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.5993189Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.5994137Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.5994616Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.5995553Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.5995926Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.5996836Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5997297Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5998232Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.5998697Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.5999603Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6000028Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6000932Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6001429Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6003010Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:13.6003360Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6003974Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6005087Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6005433Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6006108Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6006631Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6007081Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6007583Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6008735Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6009225Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6010184Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6010574Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6011514Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6012021Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6012957Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6013442Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6014367Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6014810Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6015774Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6016256Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6018147Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6018566Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6019223Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6020368Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6020925Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6021740Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6022295Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6022748Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6023282Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6024298Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6024805Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6025809Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6026209Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6027220Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6027707Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6028661Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6029160Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6030170Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6030626Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6031598Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6032096Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6033828Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.6034226Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6034844Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6035920Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6036297Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6036969Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6037494Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6037593Z dist init r=1, world=4
2025-12-04T09:59:13.6037685Z dist init r=3, world=4
2025-12-04T09:59:13.6037786Z dist init r=0, world=4
2025-12-04T09:59:13.6037879Z dist init r=2, world=4
2025-12-04T09:59:13.6038972Z [rank0]:[W1204 09:48:49.524393723 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6039068Z FAILED [11.1211s] [100%]
2025-12-04T09:59:13.6039074Z 
2025-12-04T09:59:13.6039209Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6039512Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __
2025-12-04T09:59:13.6039624Z Traceback (most recent call last):
2025-12-04T09:59:13.6040140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6040280Z     self._join_processes(fn)
2025-12-04T09:59:13.6041010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6041160Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6041748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6041862Z     raise RuntimeError(error)
2025-12-04T09:59:13.6042093Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.6042208Z Traceback (most recent call last):
2025-12-04T09:59:13.6042744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6042853Z     getattr(self, test_name)()
2025-12-04T09:59:13.6043400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6043496Z     fn()
2025-12-04T09:59:13.6044013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6044115Z     method(*args, **kwargs)
2025-12-04T09:59:13.6044611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6044713Z     method(*args, **kwargs)
2025-12-04T09:59:13.6045204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6045296Z     with policy():
2025-12-04T09:59:13.6045820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6045934Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6047216Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6047224Z 
2025-12-04T09:59:13.6047434Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6048104Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6048138Z 
2025-12-04T09:59:13.6048386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6048392Z 
2025-12-04T09:59:13.6048407Z 
2025-12-04T09:59:13.6048610Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6048857Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6049621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml -
2025-12-04T09:59:13.6049779Z =========================== short test summary info ============================
2025-12-04T09:59:13.6050578Z FAILED [11.1211s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.6050700Z Traceback (most recent call last):
2025-12-04T09:59:13.6051219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6051329Z     getattr(self, test_name)()
2025-12-04T09:59:13.6051835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6051921Z     fn()
2025-12-04T09:59:13.6052434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6052535Z     method(*args, **kwargs)
2025-12-04T09:59:13.6053021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6053121Z     method(*args, **kwargs)
2025-12-04T09:59:13.6053598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6053700Z     with policy():
2025-12-04T09:59:13.6054177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6054278Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6055469Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6055475Z 
2025-12-04T09:59:13.6055674Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6056426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6056433Z 
2025-12-04T09:59:13.6056856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6057058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6057236Z ====================== 1 failed, 26 deselected in 11.34s =======================
2025-12-04T09:59:13.6057371Z Got exit code 1
2025-12-04T09:59:13.6057489Z Retrying single test...
2025-12-04T09:59:13.6058120Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml
2025-12-04T09:59:13.6058279Z ============================= test session starts ==============================
2025-12-04T09:59:13.6058637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6058741Z cachedir: .pytest_cache
2025-12-04T09:59:13.6059260Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6059380Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6059514Z configfile: pytest.ini
2025-12-04T09:59:13.6060054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6060266Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6061048Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6061169Z Running 1 items in this shard
2025-12-04T09:59:13.6061177Z 
2025-12-04T09:59:13.6062229Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:56.294000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68181
2025-12-04T09:59:13.6062734Z I1204 09:48:56.295000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68182
2025-12-04T09:59:13.6063231Z I1204 09:48:56.295000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68183
2025-12-04T09:59:13.6063730Z I1204 09:48:56.296000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68184
2025-12-04T09:59:13.6065802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6065908Z   _warn_cpu_init()
2025-12-04T09:59:13.6067958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6068062Z   _warn_cpu_init()
2025-12-04T09:59:13.6070141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6070232Z   _warn_cpu_init()
2025-12-04T09:59:13.6072135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6072253Z   _warn_cpu_init()
2025-12-04T09:59:13.6073198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6073302Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6073734Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6074239Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6075203Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6075693Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6076623Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6077004Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6077907Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6078365Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6079467Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6079986Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6080923Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6081356Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6082318Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6082824Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6084466Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.6084826Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6085464Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6086591Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6086974Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6087679Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6088207Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6088671Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6089194Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6090277Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6090763Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6091694Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6092081Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6092988Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6093452Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6094395Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6094852Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6095760Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6096181Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6097406Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6097907Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6099603Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.6099969Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6100657Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6101816Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6102179Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6102909Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6103480Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6103952Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6104486Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6105493Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6106014Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6106998Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6107413Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6108379Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6109008Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6109922Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6110381Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6111296Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6111738Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6112675Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6113142Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6114726Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6115098Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6115724Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6116811Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6117155Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6117869Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6118385Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6118821Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6119324Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6120258Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6120877Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6122041Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6122454Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6123481Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6123983Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6124943Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6125428Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6126442Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6126892Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6127858Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6128345Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6130043Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.6130447Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6131107Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6132262Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6132677Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6133403Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6134048Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6134151Z dist init r=0, world=4
2025-12-04T09:59:13.6134245Z dist init r=1, world=4
2025-12-04T09:59:13.6134334Z dist init r=3, world=4
2025-12-04T09:59:13.6134434Z dist init r=2, world=4
2025-12-04T09:59:13.6135522Z [rank0]:[W1204 09:49:05.176575241 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6135621Z FAILED [11.2195s] [100%]
2025-12-04T09:59:13.6135638Z 
2025-12-04T09:59:13.6135780Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6136071Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __
2025-12-04T09:59:13.6136193Z Traceback (most recent call last):
2025-12-04T09:59:13.6136992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6137110Z     self._join_processes(fn)
2025-12-04T09:59:13.6137708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6137849Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6138462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6138575Z     raise RuntimeError(error)
2025-12-04T09:59:13.6138807Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6138935Z Traceback (most recent call last):
2025-12-04T09:59:13.6139508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6139617Z     getattr(self, test_name)()
2025-12-04T09:59:13.6140164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6140253Z     fn()
2025-12-04T09:59:13.6140765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6140869Z     method(*args, **kwargs)
2025-12-04T09:59:13.6141373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6141491Z     method(*args, **kwargs)
2025-12-04T09:59:13.6141989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6142115Z     with policy():
2025-12-04T09:59:13.6142635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6142752Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6143977Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.6143983Z 
2025-12-04T09:59:13.6144199Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6144935Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6144940Z 
2025-12-04T09:59:13.6145205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6145212Z 
2025-12-04T09:59:13.6145380Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.6145510Z Traceback (most recent call last):
2025-12-04T09:59:13.6146062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6146172Z     getattr(self, test_name)()
2025-12-04T09:59:13.6146712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6146799Z     fn()
2025-12-04T09:59:13.6147317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6147424Z     method(*args, **kwargs)
2025-12-04T09:59:13.6147930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6148041Z     method(*args, **kwargs)
2025-12-04T09:59:13.6148543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6148648Z     with policy():
2025-12-04T09:59:13.6149290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6149392Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6150551Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6150558Z 
2025-12-04T09:59:13.6150760Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6151426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6151433Z 
2025-12-04T09:59:13.6151681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6151685Z 
2025-12-04T09:59:13.6151717Z 
2025-12-04T09:59:13.6151927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6152178Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6152935Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml -
2025-12-04T09:59:13.6153105Z =========================== short test summary info ============================
2025-12-04T09:59:13.6153906Z FAILED [11.2195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6154048Z Traceback (most recent call last):
2025-12-04T09:59:13.6154568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6154673Z     getattr(self, test_name)()
2025-12-04T09:59:13.6155192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6155277Z     fn()
2025-12-04T09:59:13.6155758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6155866Z     method(*args, **kwargs)
2025-12-04T09:59:13.6156343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6156481Z     method(*args, **kwargs)
2025-12-04T09:59:13.6156963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6157055Z     with policy():
2025-12-04T09:59:13.6157544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6157650Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6158805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240.
2025-12-04T09:59:13.6158819Z 
2025-12-04T09:59:13.6159020Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6159667Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6159674Z 
2025-12-04T09:59:13.6159935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6159941Z 
2025-12-04T09:59:13.6160090Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.6160213Z Traceback (most recent call last):
2025-12-04T09:59:13.6160758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6160863Z     getattr(self, test_name)()
2025-12-04T09:59:13.6161375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6161462Z     fn()
2025-12-04T09:59:13.6161945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6162056Z     method(*args, **kwargs)
2025-12-04T09:59:13.6162533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6162641Z     method(*args, **kwargs)
2025-12-04T09:59:13.6163114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6163205Z     with policy():
2025-12-04T09:59:13.6163719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6163826Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6164963Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.6164979Z 
2025-12-04T09:59:13.6165180Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6165828Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6165860Z 
2025-12-04T09:59:13.6166119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6166288Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6166467Z ====================== 1 failed, 26 deselected in 11.44s =======================
2025-12-04T09:59:13.6166556Z Got exit code 1
2025-12-04T09:59:13.6167130Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda
2025-12-04T09:59:13.6167519Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.6168100Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml
2025-12-04T09:59:13.6168290Z ============================= test session starts ==============================
2025-12-04T09:59:13.6168621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6168721Z cachedir: .pytest_cache
2025-12-04T09:59:13.6169211Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6169326Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6169425Z configfile: pytest.ini
2025-12-04T09:59:13.6169933Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6170137Z collecting ... collected 60 items / 14 deselected / 46 selected
2025-12-04T09:59:13.6170277Z stepcurrent: skipping 14 already run items.
2025-12-04T09:59:13.6170447Z Running 13 items in this shard
2025-12-04T09:59:13.6170452Z 
2025-12-04T09:59:13.6171507Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:12.004000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68518
2025-12-04T09:59:13.6181043Z I1204 09:49:12.005000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68519
2025-12-04T09:59:13.6181737Z I1204 09:49:12.005000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68520
2025-12-04T09:59:13.6182245Z I1204 09:49:12.006000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68521
2025-12-04T09:59:13.6183254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6183399Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6184383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6184518Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6185547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6185679Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6186673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6186805Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6188851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6189176Z   _warn_cpu_init()
2025-12-04T09:59:13.6191078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6191204Z   _warn_cpu_init()
2025-12-04T09:59:13.6193093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6193192Z   _warn_cpu_init()
2025-12-04T09:59:13.6195078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6195180Z   _warn_cpu_init()
2025-12-04T09:59:13.6196117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6196334Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6197296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6197505Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6198444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6198650Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6199627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6199834Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6204089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6204489Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6208715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6209121Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6213371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6213748Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6218404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6218807Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6219589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6219743Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6220522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6220645Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6221620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6221733Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6222505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6222702Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6223463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6223573Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6224332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6224447Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6225198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6225312Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6226067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6226175Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6227188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6227300Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6227814Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6228354Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6229358Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6229880Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6230905Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6231318Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6232282Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6232788Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6233819Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6234322Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6235237Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6235657Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6236573Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6237061Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6238663Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6239006Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6239623Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6240719Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6241062Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6241776Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6242290Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6242724Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6243220Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6244165Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6244680Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6245800Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6246198Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6247123Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6247600Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6248590Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6249061Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6249992Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6250424Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6251399Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6251878Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6253524Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832.
2025-12-04T09:59:13.6253877Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6254519Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6255647Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6256029Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6256983Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6257534Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6258000Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6258529Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6259968Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6260490Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6261476Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6261888Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6262854Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6263374Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6264342Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6264832Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6265800Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6266278Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6267260Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6267754Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6269623Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 607059968 and is now 674168832.
2025-12-04T09:59:13.6269951Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6270534Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6271592Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6271916Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6272558Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6273042Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6273453Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6273951Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6274850Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6275307Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6276178Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6276541Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6277434Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6277867Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6278728Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6279189Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6280054Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6280460Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6281329Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6281764Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6283274Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832.
2025-12-04T09:59:13.6283599Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6284210Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6285235Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6285559Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6286204Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6286690Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6286810Z dist init r=0, world=4
2025-12-04T09:59:13.6286912Z dist init r=3, world=4
2025-12-04T09:59:13.6287000Z dist init r=1, world=4
2025-12-04T09:59:13.6287094Z dist init r=2, world=4
2025-12-04T09:59:13.6288121Z [rank0]:[W1204 09:49:20.726194359 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6288210Z FAILED [9.6776s] [  7%]
2025-12-04T09:59:13.6288219Z 
2025-12-04T09:59:13.6288363Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6288643Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __
2025-12-04T09:59:13.6288786Z Traceback (most recent call last):
2025-12-04T09:59:13.6289277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6289376Z     self._join_processes(fn)
2025-12-04T09:59:13.6289910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6290038Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6290577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6290685Z     raise RuntimeError(error)
2025-12-04T09:59:13.6290893Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6291037Z Traceback (most recent call last):
2025-12-04T09:59:13.6291524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6291626Z     getattr(self, test_name)()
2025-12-04T09:59:13.6292113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6292199Z     fn()
2025-12-04T09:59:13.6292649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6292745Z     method(*args, **kwargs)
2025-12-04T09:59:13.6293200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6293287Z     method(*args, **kwargs)
2025-12-04T09:59:13.6293743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6293830Z     with policy():
2025-12-04T09:59:13.6294279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6294385Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6295506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6295514Z 
2025-12-04T09:59:13.6295716Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6296416Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6296424Z 
2025-12-04T09:59:13.6296662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6296675Z 
2025-12-04T09:59:13.6296847Z 
2025-12-04T09:59:13.6297076Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6297343Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6298187Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml -
2025-12-04T09:59:13.6298361Z =========================== short test summary info ============================
2025-12-04T09:59:13.6299261Z FAILED [9.6776s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6299382Z Traceback (most recent call last):
2025-12-04T09:59:13.6299935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6300056Z     getattr(self, test_name)()
2025-12-04T09:59:13.6300590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6300709Z     fn()
2025-12-04T09:59:13.6301222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6301330Z     method(*args, **kwargs)
2025-12-04T09:59:13.6301848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6301953Z     method(*args, **kwargs)
2025-12-04T09:59:13.6302458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6302564Z     with policy():
2025-12-04T09:59:13.6303070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6303210Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6304445Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6304456Z 
2025-12-04T09:59:13.6304670Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6305376Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6305382Z 
2025-12-04T09:59:13.6305647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6305834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6306010Z ======================= 1 failed, 14 deselected in 9.89s =======================
2025-12-04T09:59:13.6306108Z Got exit code 1
2025-12-04T09:59:13.6306222Z Retrying single test...
2025-12-04T09:59:13.6306844Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml
2025-12-04T09:59:13.6307004Z ============================= test session starts ==============================
2025-12-04T09:59:13.6307389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6307498Z cachedir: .pytest_cache
2025-12-04T09:59:13.6308025Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6308142Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6308247Z configfile: pytest.ini
2025-12-04T09:59:13.6308894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6309217Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6309921Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6310020Z Running 1 items in this shard
2025-12-04T09:59:13.6310054Z 
2025-12-04T09:59:13.6310997Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:26.494000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68855
2025-12-04T09:59:13.6311446Z I1204 09:49:26.495000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68856
2025-12-04T09:59:13.6311882Z I1204 09:49:26.495000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68857
2025-12-04T09:59:13.6312324Z I1204 09:49:26.496000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68858
2025-12-04T09:59:13.6313432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6313592Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6314535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6314658Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6315595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6315745Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6316668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6316802Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6318717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6318816Z   _warn_cpu_init()
2025-12-04T09:59:13.6320709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6320982Z   _warn_cpu_init()
2025-12-04T09:59:13.6323229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6323349Z   _warn_cpu_init()
2025-12-04T09:59:13.6325405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6325514Z   _warn_cpu_init()
2025-12-04T09:59:13.6326510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6326730Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6327731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6327991Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6328999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6329220Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6330207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6330446Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6335164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6335561Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6340413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6340815Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6345578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6346003Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6350556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6350979Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6351733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6351851Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6352596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6352707Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6353794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6353899Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6354591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6354716Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6355386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6355488Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6356162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6356266Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6356937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6357036Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6357745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6357843Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6358737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6358830Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6359241Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6359731Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6360658Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6361127Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6362004Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6362359Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6363255Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6363690Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6364554Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6364984Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6365844Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6366243Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6367101Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6367568Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6369299Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 611254272 and is now 674168832.
2025-12-04T09:59:13.6369660Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6370278Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6371405Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6371748Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6372419Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6372937Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6373364Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6373906Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6374854Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6375340Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6376535Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6377146Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6378120Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6378612Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6379583Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6380075Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6381046Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6381504Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6382515Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6383015Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6384687Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 718209024 and is now 783220736.
2025-12-04T09:59:13.6385064Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6385751Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6386912Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6387277Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6387990Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6388580Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6389133Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6389619Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6390505Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6390962Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6391868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6392223Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6393315Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6393780Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6394885Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6395367Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6396295Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6396768Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6397729Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6398212Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6399868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832.
2025-12-04T09:59:13.6400224Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6400868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6401978Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6402340Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6403064Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6403601Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6404039Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6404549Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6405529Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6406054Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6407026Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6407413Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6408453Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6408926Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6409830Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6410407Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6411289Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6411695Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6412545Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6412981Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6414571Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 607059968 and is now 674168832.
2025-12-04T09:59:13.6414893Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6415689Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6417046Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6417619Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6418347Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6418900Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6419004Z dist init r=0, world=4
2025-12-04T09:59:13.6419103Z dist init r=2, world=4
2025-12-04T09:59:13.6419207Z dist init r=3, world=4
2025-12-04T09:59:13.6419338Z dist init r=1, world=4
2025-12-04T09:59:13.6420490Z [rank0]:[W1204 09:49:34.147198041 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6420600Z FAILED [9.3682s] [100%]
2025-12-04T09:59:13.6420606Z 
2025-12-04T09:59:13.6420953Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6421314Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __
2025-12-04T09:59:13.6421441Z Traceback (most recent call last):
2025-12-04T09:59:13.6421991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6422111Z     self._join_processes(fn)
2025-12-04T09:59:13.6422698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6422840Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6423463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6423573Z     raise RuntimeError(error)
2025-12-04T09:59:13.6423814Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.6423933Z Traceback (most recent call last):
2025-12-04T09:59:13.6424539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6424662Z     getattr(self, test_name)()
2025-12-04T09:59:13.6425196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6425292Z     fn()
2025-12-04T09:59:13.6425797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6425900Z     method(*args, **kwargs)
2025-12-04T09:59:13.6426433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6426539Z     method(*args, **kwargs)
2025-12-04T09:59:13.6427080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6427187Z     with policy():
2025-12-04T09:59:13.6427697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6427812Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6429039Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832.
2025-12-04T09:59:13.6429048Z 
2025-12-04T09:59:13.6429260Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6429973Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6430016Z 
2025-12-04T09:59:13.6430282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6430288Z 
2025-12-04T09:59:13.6430293Z 
2025-12-04T09:59:13.6430519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6430778Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6431591Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml -
2025-12-04T09:59:13.6431764Z =========================== short test summary info ============================
2025-12-04T09:59:13.6432785Z FAILED [9.3682s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.6433029Z Traceback (most recent call last):
2025-12-04T09:59:13.6433548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6433654Z     getattr(self, test_name)()
2025-12-04T09:59:13.6434282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6434366Z     fn()
2025-12-04T09:59:13.6434820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6434914Z     method(*args, **kwargs)
2025-12-04T09:59:13.6435363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6435464Z     method(*args, **kwargs)
2025-12-04T09:59:13.6435909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6436004Z     with policy():
2025-12-04T09:59:13.6436458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6436552Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6437677Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832.
2025-12-04T09:59:13.6437683Z 
2025-12-04T09:59:13.6437876Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6438504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6438509Z 
2025-12-04T09:59:13.6438744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6438901Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6439094Z ======================= 1 failed, 26 deselected in 9.58s =======================
2025-12-04T09:59:13.6439181Z Got exit code 1
2025-12-04T09:59:13.6439274Z Retrying single test...
2025-12-04T09:59:13.6439835Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml
2025-12-04T09:59:13.6439975Z ============================= test session starts ==============================
2025-12-04T09:59:13.6440288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6440381Z cachedir: .pytest_cache
2025-12-04T09:59:13.6440844Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6440984Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6441076Z configfile: pytest.ini
2025-12-04T09:59:13.6441742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6441951Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6442681Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6442788Z Running 1 items in this shard
2025-12-04T09:59:13.6442793Z 
2025-12-04T09:59:13.6443796Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:40.814000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69192
2025-12-04T09:59:13.6444310Z I1204 09:49:40.815000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69193
2025-12-04T09:59:13.6444954Z I1204 09:49:40.815000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69194
2025-12-04T09:59:13.6445436Z I1204 09:49:40.816000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69195
2025-12-04T09:59:13.6446414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6446544Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6448508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6448605Z   _warn_cpu_init()
2025-12-04T09:59:13.6449618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6449831Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6450791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6450929Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6451881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6452013Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6452986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6453112Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6455137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6455267Z   _warn_cpu_init()
2025-12-04T09:59:13.6457640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6457743Z   _warn_cpu_init()
2025-12-04T09:59:13.6459780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6459914Z   _warn_cpu_init()
2025-12-04T09:59:13.6460942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6461165Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6462152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6462375Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6463375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6463600Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6468128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6468557Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6469431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6469548Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6473971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6474384Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6478802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6479219Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6479975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6480081Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6484450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.6484832Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.6485611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6485719Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6486456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6486571Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6487301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6487404Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6488190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6488305Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6489048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6489154Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6489885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6490021Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6490993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6491114Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6491560Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6492080Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6493059Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6493554Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6494522Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6494909Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6495871Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6496410Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6497541Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6498041Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6499038Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6499497Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6500457Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6500954Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6502642Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6503048Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6503736Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6504891Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6505293Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6506006Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6506559Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6507009Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6507537Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6508658Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6509141Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6510082Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6510482Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6511395Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6511849Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6512752Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6513429Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6514362Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6514804Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6515742Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6516229Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6517906Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832.
2025-12-04T09:59:13.6518264Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6518903Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6520049Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6520407Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6521419Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6521978Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6522428Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6522956Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6523959Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6524466Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6525525Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6525923Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6526889Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6527376Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6528374Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6528872Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6529828Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6530282Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6531242Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6531781Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6533586Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832.
2025-12-04T09:59:13.6533943Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6534645Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6535763Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6536123Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6537062Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6537620Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6538074Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6538598Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6539702Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6540214Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6541211Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6541608Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6542570Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6543097Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6544061Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6544553Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6545510Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6545961Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6546957Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6547456Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6549241Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832.
2025-12-04T09:59:13.6549628Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6550278Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6551394Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6551752Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6552440Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6552974Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6553074Z dist init r=1, world=4
2025-12-04T09:59:13.6553168Z dist init r=0, world=4
2025-12-04T09:59:13.6553269Z dist init r=2, world=4
2025-12-04T09:59:13.6553362Z dist init r=3, world=4
2025-12-04T09:59:13.6554534Z [rank0]:[W1204 09:49:48.479500936 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6554640Z FAILED [9.6858s] [100%]
2025-12-04T09:59:13.6554646Z 
2025-12-04T09:59:13.6554789Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6555097Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __
2025-12-04T09:59:13.6555213Z Traceback (most recent call last):
2025-12-04T09:59:13.6555745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6555864Z     self._join_processes(fn)
2025-12-04T09:59:13.6556430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6556600Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6557191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6557298Z     raise RuntimeError(error)
2025-12-04T09:59:13.6557527Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6557639Z Traceback (most recent call last):
2025-12-04T09:59:13.6558165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6558283Z     getattr(self, test_name)()
2025-12-04T09:59:13.6558797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6558915Z     fn()
2025-12-04T09:59:13.6559407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6559511Z     method(*args, **kwargs)
2025-12-04T09:59:13.6560004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6560105Z     method(*args, **kwargs)
2025-12-04T09:59:13.6560590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6560692Z     with policy():
2025-12-04T09:59:13.6561183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6561325Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6562520Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6562531Z 
2025-12-04T09:59:13.6562744Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6563426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6563432Z 
2025-12-04T09:59:13.6563688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6563693Z 
2025-12-04T09:59:13.6563697Z 
2025-12-04T09:59:13.6563915Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6564168Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6564953Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml -
2025-12-04T09:59:13.6565118Z =========================== short test summary info ============================
2025-12-04T09:59:13.6565982Z FAILED [9.6858s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6566110Z Traceback (most recent call last):
2025-12-04T09:59:13.6566646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6566814Z     getattr(self, test_name)()
2025-12-04T09:59:13.6567333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6567418Z     fn()
2025-12-04T09:59:13.6567913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6568016Z     method(*args, **kwargs)
2025-12-04T09:59:13.6568532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6568637Z     method(*args, **kwargs)
2025-12-04T09:59:13.6569130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6569228Z     with policy():
2025-12-04T09:59:13.6569719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6569821Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6571020Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736.
2025-12-04T09:59:13.6571053Z 
2025-12-04T09:59:13.6571259Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6571947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6571953Z 
2025-12-04T09:59:13.6572208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6572383Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6572563Z ======================= 1 failed, 26 deselected in 9.91s =======================
2025-12-04T09:59:13.6572657Z Got exit code 1
2025-12-04T09:59:13.6573287Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda
2025-12-04T09:59:13.6573678Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.6574281Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml
2025-12-04T09:59:13.6574450Z ============================= test session starts ==============================
2025-12-04T09:59:13.6574787Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6574899Z cachedir: .pytest_cache
2025-12-04T09:59:13.6575394Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6575511Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6575621Z configfile: pytest.ini
2025-12-04T09:59:13.6576145Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6576427Z collecting ... collected 60 items / 15 deselected / 45 selected
2025-12-04T09:59:13.6576575Z stepcurrent: skipping 15 already run items.
2025-12-04T09:59:13.6576853Z Running 12 items in this shard
2025-12-04T09:59:13.6576860Z 
2025-12-04T09:59:13.6577958Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:49:55.183000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69529
2025-12-04T09:59:13.6578456Z I1204 09:49:55.184000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69530
2025-12-04T09:59:13.6578945Z I1204 09:49:55.185000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69531
2025-12-04T09:59:13.6579443Z I1204 09:49:55.186000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69532
2025-12-04T09:59:13.6581512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6581624Z   _warn_cpu_init()
2025-12-04T09:59:13.6583635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6583747Z   _warn_cpu_init()
2025-12-04T09:59:13.6585759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6585894Z   _warn_cpu_init()
2025-12-04T09:59:13.6587908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6588042Z   _warn_cpu_init()
2025-12-04T09:59:13.6589151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6589265Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6589721Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6590243Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6591243Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6591739Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6592708Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6593147Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6594087Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6594565Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6595499Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6596007Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6596934Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6597365Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6598318Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6598795Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6600456Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008.
2025-12-04T09:59:13.6600815Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6601471Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6602601Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6602955Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6603668Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6604198Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6604646Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6605185Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6606168Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6606664Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6607654Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6608054Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6608987Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6609477Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6610441Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6610922Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6611867Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6612302Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6613249Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6613760Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6615389Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.6615747Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6616511Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6617830Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6618201Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6618926Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6619472Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6619936Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6620468Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6622015Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6622905Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6623905Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6624320Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6625283Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6625837Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6626802Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6627294Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6628264Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6628712Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6629738Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6630229Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6631903Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6632304Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6632976Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6634214Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6634569Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6635278Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6635811Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6636254Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6636773Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6637842Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6638347Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6639304Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6639702Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6640664Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6641152Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6642082Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6642558Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6643506Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6643972Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6644919Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6645393Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6647017Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6647415Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6648068Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6649168Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6649519Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6650226Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6650755Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6650866Z dist init r=3, world=4
2025-12-04T09:59:13.6650963Z dist init r=2, world=4
2025-12-04T09:59:13.6651086Z dist init r=1, world=4
2025-12-04T09:59:13.6651191Z dist init r=0, world=4
2025-12-04T09:59:13.6652313Z [rank0]:[W1204 09:50:03.455570687 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6652413Z FAILED [9.8998s] [  8%]
2025-12-04T09:59:13.6652429Z 
2025-12-04T09:59:13.6652573Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6652873Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____
2025-12-04T09:59:13.6653002Z Traceback (most recent call last):
2025-12-04T09:59:13.6653534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6653673Z     self._join_processes(fn)
2025-12-04T09:59:13.6654258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6654395Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6654996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6655107Z     raise RuntimeError(error)
2025-12-04T09:59:13.6655336Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.6655463Z Traceback (most recent call last):
2025-12-04T09:59:13.6655992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6656131Z     getattr(self, test_name)()
2025-12-04T09:59:13.6656925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6657023Z     fn()
2025-12-04T09:59:13.6657548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6657657Z     method(*args, **kwargs)
2025-12-04T09:59:13.6658169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6658287Z     method(*args, **kwargs)
2025-12-04T09:59:13.6658796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6658934Z     with policy():
2025-12-04T09:59:13.6659455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6659570Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6660805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008.
2025-12-04T09:59:13.6660812Z 
2025-12-04T09:59:13.6661031Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6661715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6661730Z 
2025-12-04T09:59:13.6661997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6662004Z 
2025-12-04T09:59:13.6662008Z 
2025-12-04T09:59:13.6662227Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6662502Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6663310Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml -
2025-12-04T09:59:13.6663523Z =========================== short test summary info ============================
2025-12-04T09:59:13.6664366Z FAILED [9.8998s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.6664490Z Traceback (most recent call last):
2025-12-04T09:59:13.6665054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6665172Z     getattr(self, test_name)()
2025-12-04T09:59:13.6665721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6665813Z     fn()
2025-12-04T09:59:13.6666325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6666469Z     method(*args, **kwargs)
2025-12-04T09:59:13.6666980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6667082Z     method(*args, **kwargs)
2025-12-04T09:59:13.6667595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6667692Z     with policy():
2025-12-04T09:59:13.6668215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6668329Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6669626Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008.
2025-12-04T09:59:13.6669663Z 
2025-12-04T09:59:13.6669888Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6670555Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6670561Z 
2025-12-04T09:59:13.6670828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6671003Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6671202Z ====================== 1 failed, 15 deselected in 10.12s =======================
2025-12-04T09:59:13.6671307Z Got exit code 1
2025-12-04T09:59:13.6671410Z Retrying single test...
2025-12-04T09:59:13.6672028Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml
2025-12-04T09:59:13.6672188Z ============================= test session starts ==============================
2025-12-04T09:59:13.6672535Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6672650Z cachedir: .pytest_cache
2025-12-04T09:59:13.6673149Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6673268Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6673380Z configfile: pytest.ini
2025-12-04T09:59:13.6673896Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6674121Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6674862Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6674973Z Running 1 items in this shard
2025-12-04T09:59:13.6674978Z 
2025-12-04T09:59:13.6676058Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:50:09.934000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69866
2025-12-04T09:59:13.6676540Z I1204 09:50:09.935000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69867
2025-12-04T09:59:13.6677026Z I1204 09:50:09.936000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69868
2025-12-04T09:59:13.6677510Z I1204 09:50:09.936000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69869
2025-12-04T09:59:13.6679547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6679648Z   _warn_cpu_init()
2025-12-04T09:59:13.6681595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6681707Z   _warn_cpu_init()
2025-12-04T09:59:13.6683693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6683794Z   _warn_cpu_init()
2025-12-04T09:59:13.6685737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6685878Z   _warn_cpu_init()
2025-12-04T09:59:13.6686850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6686967Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6687416Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6687940Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6688914Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6689416Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6690525Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6690906Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6691812Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6692281Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6693190Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6693687Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6694594Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6695024Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6695934Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6696477Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6698380Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912.
2025-12-04T09:59:13.6698748Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6699422Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6700593Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6701002Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6701722Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6702278Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6702735Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6703271Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6704287Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6704804Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6705836Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6706236Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6707192Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6707697Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6708805Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6709401Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6710303Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6710740Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6711649Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6712146Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6713736Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6714081Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6714750Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6715841Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6716201Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6716880Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6717405Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6717834Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6718337Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6719323Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6719803Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6720882Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6721455Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6722418Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6722984Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6723954Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6724454Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6725425Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6725950Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6726915Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6727410Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6729087Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008.
2025-12-04T09:59:13.6729491Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6730169Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6731303Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6731684Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6732400Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6732944Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6733412Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6734034Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6735024Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6735503Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6736519Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6737090Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6738098Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6738599Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6739565Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6740072Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6741032Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6741523Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6742488Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6742983Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6744682Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.6745052Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6745730Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6746866Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6747237Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6747951Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6748499Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6748730Z dist init r=0, world=4
2025-12-04T09:59:13.6748861Z dist init r=1, world=4
2025-12-04T09:59:13.6748970Z dist init r=2, world=4
2025-12-04T09:59:13.6749178Z dist init r=3, world=4
2025-12-04T09:59:13.6750270Z [rank0]:[W1204 09:50:18.109269972 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6750381Z FAILED [10.1350s] [100%]
2025-12-04T09:59:13.6750386Z 
2025-12-04T09:59:13.6750527Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6750836Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____
2025-12-04T09:59:13.6750951Z Traceback (most recent call last):
2025-12-04T09:59:13.6751515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6751629Z     self._join_processes(fn)
2025-12-04T09:59:13.6752188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6752322Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6753024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6753132Z     raise RuntimeError(error)
2025-12-04T09:59:13.6753355Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.6753460Z Traceback (most recent call last):
2025-12-04T09:59:13.6753939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6754082Z     getattr(self, test_name)()
2025-12-04T09:59:13.6754556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6754636Z     fn()
2025-12-04T09:59:13.6755094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6755189Z     method(*args, **kwargs)
2025-12-04T09:59:13.6755643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6755733Z     method(*args, **kwargs)
2025-12-04T09:59:13.6756210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6756308Z     with policy():
2025-12-04T09:59:13.6756762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6756861Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6757953Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6757958Z 
2025-12-04T09:59:13.6758150Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6758768Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6758775Z 
2025-12-04T09:59:13.6759011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6759016Z 
2025-12-04T09:59:13.6759020Z 
2025-12-04T09:59:13.6759227Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6759465Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6760207Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml -
2025-12-04T09:59:13.6760374Z =========================== short test summary info ============================
2025-12-04T09:59:13.6761122Z FAILED [10.1350s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.6761245Z Traceback (most recent call last):
2025-12-04T09:59:13.6761738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6761836Z     getattr(self, test_name)()
2025-12-04T09:59:13.6762319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6762401Z     fn()
2025-12-04T09:59:13.6762888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6762986Z     method(*args, **kwargs)
2025-12-04T09:59:13.6763438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6763540Z     method(*args, **kwargs)
2025-12-04T09:59:13.6764014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6764100Z     with policy():
2025-12-04T09:59:13.6764565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6764665Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6765752Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6765784Z 
2025-12-04T09:59:13.6765977Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6766583Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6766596Z 
2025-12-04T09:59:13.6766828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6766987Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6767186Z ====================== 1 failed, 26 deselected in 10.35s =======================
2025-12-04T09:59:13.6767274Z Got exit code 1
2025-12-04T09:59:13.6767369Z Retrying single test...
2025-12-04T09:59:13.6768124Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml
2025-12-04T09:59:13.6768279Z ============================= test session starts ==============================
2025-12-04T09:59:13.6768610Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6768712Z cachedir: .pytest_cache
2025-12-04T09:59:13.6769198Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6769317Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6769421Z configfile: pytest.ini
2025-12-04T09:59:13.6769930Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6770141Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6770859Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6770976Z Running 1 items in this shard
2025-12-04T09:59:13.6770983Z 
2025-12-04T09:59:13.6771995Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:50:24.834000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70203
2025-12-04T09:59:13.6772465Z I1204 09:50:24.835000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70204
2025-12-04T09:59:13.6772939Z I1204 09:50:24.836000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70205
2025-12-04T09:59:13.6773401Z I1204 09:50:24.836000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70206
2025-12-04T09:59:13.6775349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6775444Z   _warn_cpu_init()
2025-12-04T09:59:13.6777657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6777802Z   _warn_cpu_init()
2025-12-04T09:59:13.6779821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6779922Z   _warn_cpu_init()
2025-12-04T09:59:13.6781945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6782076Z   _warn_cpu_init()
2025-12-04T09:59:13.6783088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6783210Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6783678Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6784228Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6785234Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6785743Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6786775Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6787179Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6788154Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6788763Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6789760Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6790223Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6791070Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6791475Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6792331Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6792802Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6794286Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.6794617Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6795204Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6796249Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6796575Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6797211Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6797706Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6798130Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6798614Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6799503Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6799997Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6800886Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6801240Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6802101Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6802540Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6803607Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6804070Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6804966Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6805398Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6806311Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6806815Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6808387Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008.
2025-12-04T09:59:13.6808778Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6809397Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6810481Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6810825Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6811503Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6812027Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6812451Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6812964Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6813936Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6814418Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6815359Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6815736Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6816899Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6817434Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6818410Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6818897Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6819861Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6820359Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6821545Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6822057Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6823732Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008.
2025-12-04T09:59:13.6824188Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6824852Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6825987Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6826364Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6827078Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6827646Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6828101Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6828687Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6829687Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6830195Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6831195Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6831597Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6832728Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6833316Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6834237Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6834701Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6835653Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6836086Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6836993Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6837493Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6839277Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008.
2025-12-04T09:59:13.6839657Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6840300Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6841399Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6841766Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6842459Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6842999Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6843126Z dist init r=0, world=4
2025-12-04T09:59:13.6843226Z dist init r=3, world=4
2025-12-04T09:59:13.6843330Z dist init r=1, world=4
2025-12-04T09:59:13.6843424Z dist init r=2, world=4
2025-12-04T09:59:13.6844564Z [rank0]:[W1204 09:50:33.943332770 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6844666Z FAILED [10.2022s] [100%]
2025-12-04T09:59:13.6844672Z 
2025-12-04T09:59:13.6844813Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6845128Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____
2025-12-04T09:59:13.6845248Z Traceback (most recent call last):
2025-12-04T09:59:13.6845816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6845930Z     self._join_processes(fn)
2025-12-04T09:59:13.6846500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6846647Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6847235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6847350Z     raise RuntimeError(error)
2025-12-04T09:59:13.6847592Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6847716Z Traceback (most recent call last):
2025-12-04T09:59:13.6848254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6848390Z     getattr(self, test_name)()
2025-12-04T09:59:13.6848911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6849010Z     fn()
2025-12-04T09:59:13.6849502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6849607Z     method(*args, **kwargs)
2025-12-04T09:59:13.6850107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6850433Z     method(*args, **kwargs)
2025-12-04T09:59:13.6851031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6851120Z     with policy():
2025-12-04T09:59:13.6851576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6851689Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6852766Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.6852772Z 
2025-12-04T09:59:13.6852974Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6853578Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6853585Z 
2025-12-04T09:59:13.6853822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6853837Z 
2025-12-04T09:59:13.6853841Z 
2025-12-04T09:59:13.6854041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6854277Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6855027Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml -
2025-12-04T09:59:13.6855181Z =========================== short test summary info ============================
2025-12-04T09:59:13.6855942Z FAILED [10.2022s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.6856051Z Traceback (most recent call last):
2025-12-04T09:59:13.6856621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6856917Z     getattr(self, test_name)()
2025-12-04T09:59:13.6857463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6857555Z     fn()
2025-12-04T09:59:13.6858121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6858233Z     method(*args, **kwargs)
2025-12-04T09:59:13.6858761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6858870Z     method(*args, **kwargs)
2025-12-04T09:59:13.6859379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6859493Z     with policy():
2025-12-04T09:59:13.6860009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6860122Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6861378Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912.
2025-12-04T09:59:13.6861386Z 
2025-12-04T09:59:13.6861604Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6862300Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6862307Z 
2025-12-04T09:59:13.6862572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6862802Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6862983Z ====================== 1 failed, 26 deselected in 10.42s =======================
2025-12-04T09:59:13.6863084Z Got exit code 1
2025-12-04T09:59:13.6863703Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda
2025-12-04T09:59:13.6864134Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.6864828Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml
2025-12-04T09:59:13.6865001Z ============================= test session starts ==============================
2025-12-04T09:59:13.6865357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6865480Z cachedir: .pytest_cache
2025-12-04T09:59:13.6865998Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6866121Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6866241Z configfile: pytest.ini
2025-12-04T09:59:13.6866776Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6867006Z collecting ... collected 60 items / 16 deselected / 44 selected
2025-12-04T09:59:13.6867176Z stepcurrent: skipping 16 already run items.
2025-12-04T09:59:13.6867291Z Running 11 items in this shard
2025-12-04T09:59:13.6867296Z 
2025-12-04T09:59:13.6868525Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:50:39.784000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70540
2025-12-04T09:59:13.6869123Z I1204 09:50:39.785000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70541
2025-12-04T09:59:13.6869580Z I1204 09:50:39.786000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70542
2025-12-04T09:59:13.6870199Z I1204 09:50:39.786000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70543
2025-12-04T09:59:13.6871176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6871316Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6873444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6873591Z   _warn_cpu_init()
2025-12-04T09:59:13.6874558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6874702Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6876661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6876844Z   _warn_cpu_init()
2025-12-04T09:59:13.6877831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6878064Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6879045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6879260Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6880222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6880353Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6882360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6882462Z   _warn_cpu_init()
2025-12-04T09:59:13.6883431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6883558Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6884525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6884771Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6886816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6886925Z   _warn_cpu_init()
2025-12-04T09:59:13.6887858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6888082Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6888816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6888950Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6889794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6889898Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6890581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6890681Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6891383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.6891489Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6892165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6892280Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6892953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6893052Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6893735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6893837Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6894524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.6894623Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6895533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.6895645Z   return func(*args, **kwargs)
2025-12-04T09:59:13.6896054Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6896631Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6897805Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6898322Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6899367Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6899766Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6900737Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6901232Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6902206Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6902744Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6903704Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6904164Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6905129Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6905665Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6907520Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 0. CUDA driver allocated memory was 711917568 and is now 785317888.
2025-12-04T09:59:13.6907902Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6908678Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6910000Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6910345Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6911028Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6911534Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.6911939Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6912431Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6913354Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6913811Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6914708Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6915065Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6915937Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6916397Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6917259Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6917693Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6918545Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6918989Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6919852Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6920309Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6922386Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.6922770Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6923429Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6924809Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6925188Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6925912Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6926474Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.6926930Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6927512Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6928525Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6929035Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6930038Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6930439Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6931456Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6931952Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6932924Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6933411Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6934472Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6934914Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6936013Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6936588Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6938597Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 3. CUDA driver allocated memory was 607059968 and is now 676265984.
2025-12-04T09:59:13.6938980Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6939676Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6940990Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6941365Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6942087Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6942682Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.6943134Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.6943674Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.6944679Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6945193Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.6946195Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6946628Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.6947594Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6948078Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6949279Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6949719Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.6950573Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6950979Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.6951833Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6952285Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.6953962Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 2. CUDA driver allocated memory was 609157120 and is now 676265984.
2025-12-04T09:59:13.6954301Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6954884Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6956061Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6956390Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.6957282Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6957808Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.6957907Z dist init r=0, world=4
2025-12-04T09:59:13.6958017Z dist init r=1, world=4
2025-12-04T09:59:13.6958111Z dist init r=2, world=4
2025-12-04T09:59:13.6958207Z dist init r=3, world=4
2025-12-04T09:59:13.6959310Z [rank0]:[W1204 09:50:47.274602847 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.6959435Z FAILED [9.8094s] [  9%]
2025-12-04T09:59:13.6959441Z 
2025-12-04T09:59:13.6959584Z =================================== FAILURES ===================================
2025-12-04T09:59:13.6960047Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _
2025-12-04T09:59:13.6960163Z Traceback (most recent call last):
2025-12-04T09:59:13.6960688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.6960794Z     self._join_processes(fn)
2025-12-04T09:59:13.6961349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.6961521Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.6962093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.6962215Z     raise RuntimeError(error)
2025-12-04T09:59:13.6962438Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.6962556Z Traceback (most recent call last):
2025-12-04T09:59:13.6963077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6963183Z     getattr(self, test_name)()
2025-12-04T09:59:13.6963687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6963785Z     fn()
2025-12-04T09:59:13.6964263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6964380Z     method(*args, **kwargs)
2025-12-04T09:59:13.6964859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6964962Z     method(*args, **kwargs)
2025-12-04T09:59:13.6965450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6965546Z     with policy():
2025-12-04T09:59:13.6966057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6966173Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6967493Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.6967501Z 
2025-12-04T09:59:13.6967720Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6968534Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6968541Z 
2025-12-04T09:59:13.6968833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6968838Z 
2025-12-04T09:59:13.6968845Z 
2025-12-04T09:59:13.6969050Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.6969295Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.6970072Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml -
2025-12-04T09:59:13.6970235Z =========================== short test summary info ============================
2025-12-04T09:59:13.6971211Z FAILED [9.8094s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.6971354Z Traceback (most recent call last):
2025-12-04T09:59:13.6971972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.6972089Z     getattr(self, test_name)()
2025-12-04T09:59:13.6972563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.6972659Z     fn()
2025-12-04T09:59:13.6973107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6973201Z     method(*args, **kwargs)
2025-12-04T09:59:13.6974169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.6974265Z     method(*args, **kwargs)
2025-12-04T09:59:13.6974713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.6974817Z     with policy():
2025-12-04T09:59:13.6975274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.6975391Z     raise RuntimeError(msg)
2025-12-04T09:59:13.6976900Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.6976912Z 
2025-12-04T09:59:13.6977135Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.6978012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6978020Z 
2025-12-04T09:59:13.6978289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.6978489Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.6978710Z ====================== 1 failed, 16 deselected in 10.03s =======================
2025-12-04T09:59:13.6978808Z Got exit code 1
2025-12-04T09:59:13.6978929Z Retrying single test...
2025-12-04T09:59:13.6979553Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml
2025-12-04T09:59:13.6979727Z ============================= test session starts ==============================
2025-12-04T09:59:13.6980079Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.6980189Z cachedir: .pytest_cache
2025-12-04T09:59:13.6980714Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.6980841Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.6980948Z configfile: pytest.ini
2025-12-04T09:59:13.6981533Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.6981758Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.6982722Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.6982837Z Running 1 items in this shard
2025-12-04T09:59:13.6982845Z 
2025-12-04T09:59:13.6984072Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:50:54.283000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70877
2025-12-04T09:59:13.6984616Z I1204 09:50:54.284000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70878
2025-12-04T09:59:13.6985115Z I1204 09:50:54.285000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70879
2025-12-04T09:59:13.6985621Z I1204 09:50:54.286000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70880
2025-12-04T09:59:13.6986628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6986810Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6988954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6989169Z   _warn_cpu_init()
2025-12-04T09:59:13.6990124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6990333Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6991281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6991411Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6993358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6993455Z   _warn_cpu_init()
2025-12-04T09:59:13.6994574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6994722Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.6996705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.6996817Z   _warn_cpu_init()
2025-12-04T09:59:13.6997788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6998015Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.6998974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.6999133Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7001088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7001185Z   _warn_cpu_init()
2025-12-04T09:59:13.7002157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7002399Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7003375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7003588Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7004336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7004456Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7005203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7005328Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7006072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7006182Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7006979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7007088Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7007833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7007938Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7008671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7008791Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7009524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7009647Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7010518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7010735Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7011633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7011731Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7012156Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7012831Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7013811Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7014301Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7015232Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7015639Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7016626Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7017296Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7018270Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7018759Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7019727Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7020174Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7021351Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7021920Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7023778Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 609157120 and is now 676265984.
2025-12-04T09:59:13.7024149Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7024850Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7026177Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7026544Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7027275Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7027821Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7028326Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7028866Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7029877Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7030398Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7031426Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7031835Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7032797Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7033351Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7034218Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7034651Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7035517Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7035947Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7036817Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7037253Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7038896Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888.
2025-12-04T09:59:13.7039252Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7039838Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7041005Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7041329Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7042001Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7042486Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7042895Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7043363Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7044250Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7044740Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7045620Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7045986Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7046841Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7047281Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7048133Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7048570Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7049453Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7049853Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7050721Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7051160Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7052839Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.7053161Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7053747Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7054921Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7055274Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7055923Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7056478Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7057087Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7057675Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7058675Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7059197Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7060184Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7060589Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7061556Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7062057Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7063044Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7063532Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7064505Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7064956Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7065925Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7066452Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7068314Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 3. CUDA driver allocated memory was 611254272 and is now 676265984.
2025-12-04T09:59:13.7068679Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7069402Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7070678Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7071003Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7071651Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7072163Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7072267Z dist init r=0, world=4
2025-12-04T09:59:13.7072356Z dist init r=2, world=4
2025-12-04T09:59:13.7072448Z dist init r=1, world=4
2025-12-04T09:59:13.7072551Z dist init r=3, world=4
2025-12-04T09:59:13.7073585Z [rank0]:[W1204 09:51:02.772088461 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7073674Z FAILED [9.7993s] [100%]
2025-12-04T09:59:13.7073679Z 
2025-12-04T09:59:13.7073820Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7074243Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _
2025-12-04T09:59:13.7074362Z Traceback (most recent call last):
2025-12-04T09:59:13.7074847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7074947Z     self._join_processes(fn)
2025-12-04T09:59:13.7075474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7075605Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7076175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7076279Z     raise RuntimeError(error)
2025-12-04T09:59:13.7076488Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7076605Z Traceback (most recent call last):
2025-12-04T09:59:13.7077084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7077183Z     getattr(self, test_name)()
2025-12-04T09:59:13.7077665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7077745Z     fn()
2025-12-04T09:59:13.7078204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7078296Z     method(*args, **kwargs)
2025-12-04T09:59:13.7078773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7078877Z     method(*args, **kwargs)
2025-12-04T09:59:13.7079323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7079408Z     with policy():
2025-12-04T09:59:13.7079871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7079968Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7081206Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.7081238Z 
2025-12-04T09:59:13.7081436Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7082205Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7082210Z 
2025-12-04T09:59:13.7082446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7082451Z 
2025-12-04T09:59:13.7082455Z 
2025-12-04T09:59:13.7082651Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7082915Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7083630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml -
2025-12-04T09:59:13.7083793Z =========================== short test summary info ============================
2025-12-04T09:59:13.7084700Z FAILED [9.7993s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7084807Z Traceback (most recent call last):
2025-12-04T09:59:13.7085306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7085406Z     getattr(self, test_name)()
2025-12-04T09:59:13.7085899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7085979Z     fn()
2025-12-04T09:59:13.7086435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7086541Z     method(*args, **kwargs)
2025-12-04T09:59:13.7086991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7087109Z     method(*args, **kwargs)
2025-12-04T09:59:13.7087568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7087655Z     with policy():
2025-12-04T09:59:13.7088114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7088210Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7089454Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984.
2025-12-04T09:59:13.7089473Z 
2025-12-04T09:59:13.7089666Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7090448Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7090453Z 
2025-12-04T09:59:13.7090697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7090857Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7091020Z ====================== 1 failed, 26 deselected in 10.02s =======================
2025-12-04T09:59:13.7091119Z Got exit code 1
2025-12-04T09:59:13.7091215Z Retrying single test...
2025-12-04T09:59:13.7091780Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml
2025-12-04T09:59:13.7091949Z ============================= test session starts ==============================
2025-12-04T09:59:13.7092264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7092372Z cachedir: .pytest_cache
2025-12-04T09:59:13.7092828Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7092936Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7093041Z configfile: pytest.ini
2025-12-04T09:59:13.7093517Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7093747Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.7094575Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7094680Z Running 1 items in this shard
2025-12-04T09:59:13.7094685Z 
2025-12-04T09:59:13.7095780Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:51:08.724000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71214
2025-12-04T09:59:13.7096222Z I1204 09:51:08.725000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71215
2025-12-04T09:59:13.7096932Z I1204 09:51:08.726000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71216
2025-12-04T09:59:13.7097434Z I1204 09:51:08.727000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71217
2025-12-04T09:59:13.7098450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7098589Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7100658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7100768Z   _warn_cpu_init()
2025-12-04T09:59:13.7101765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7101910Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7103948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7104062Z   _warn_cpu_init()
2025-12-04T09:59:13.7105055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7105279Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7106327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7106549Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7107651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7107979Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7110153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7110379Z   _warn_cpu_init()
2025-12-04T09:59:13.7111297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7111454Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7113321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7113487Z   _warn_cpu_init()
2025-12-04T09:59:13.7114467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7114736Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7115700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7115930Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7116635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7124012Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7124917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7125165Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7125943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7126055Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7126833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7126948Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7127728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7127885Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7128645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7128764Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7129521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7129632Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7130398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7130556Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7131563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7131674Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7132140Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7132693Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7133786Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7134261Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7135320Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7135715Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7136820Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7137567Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7138538Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7139024Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7140025Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7140476Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7141453Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7141944Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7143798Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888.
2025-12-04T09:59:13.7144211Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7144874Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7146193Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7146594Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7147325Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7147874Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7148328Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7148979Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7150281Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7150751Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7151659Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7152027Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7152878Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7153313Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7154178Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7154638Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7155498Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7155894Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7156759Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7157293Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7158941Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984.
2025-12-04T09:59:13.7159265Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7159856Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7161052Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7161384Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7162032Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7162516Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7162924Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7163397Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7164299Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7164780Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7165659Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7166016Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7166870Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7167302Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7168190Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7168623Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7169491Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7169888Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7170777Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7171216Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7172855Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984.
2025-12-04T09:59:13.7173205Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7173795Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7174970Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7175297Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7175941Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7176506Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7177123Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7177664Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7178704Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7179228Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7180215Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7180628Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7181612Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7182099Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7183070Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7183557Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7184524Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7184999Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7185976Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7186463Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7188321Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 3. CUDA driver allocated memory was 609157120 and is now 676265984.
2025-12-04T09:59:13.7188734Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7189450Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7190620Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7190944Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7191586Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7192073Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7192206Z dist init r=0, world=4
2025-12-04T09:59:13.7192297Z dist init r=1, world=4
2025-12-04T09:59:13.7192382Z dist init r=2, world=4
2025-12-04T09:59:13.7192481Z dist init r=3, world=4
2025-12-04T09:59:13.7193515Z [rank0]:[W1204 09:51:16.226995320 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7193609Z FAILED [10.2026s] [100%]
2025-12-04T09:59:13.7193616Z 
2025-12-04T09:59:13.7193757Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7194181Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _
2025-12-04T09:59:13.7194304Z Traceback (most recent call last):
2025-12-04T09:59:13.7194816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7194919Z     self._join_processes(fn)
2025-12-04T09:59:13.7195450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7195578Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7196128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7196230Z     raise RuntimeError(error)
2025-12-04T09:59:13.7196439Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7196553Z Traceback (most recent call last):
2025-12-04T09:59:13.7197063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7197163Z     getattr(self, test_name)()
2025-12-04T09:59:13.7197652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7197735Z     fn()
2025-12-04T09:59:13.7198188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7198281Z     method(*args, **kwargs)
2025-12-04T09:59:13.7198729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7198860Z     method(*args, **kwargs)
2025-12-04T09:59:13.7199309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7199398Z     with policy():
2025-12-04T09:59:13.7199866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7199961Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7201211Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888.
2025-12-04T09:59:13.7201217Z 
2025-12-04T09:59:13.7201413Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7202178Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7202196Z 
2025-12-04T09:59:13.7202439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7202445Z 
2025-12-04T09:59:13.7202595Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7202714Z Traceback (most recent call last):
2025-12-04T09:59:13.7203236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7203340Z     getattr(self, test_name)()
2025-12-04T09:59:13.7203827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7203907Z     fn()
2025-12-04T09:59:13.7204368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7204466Z     method(*args, **kwargs)
2025-12-04T09:59:13.7204913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7205012Z     method(*args, **kwargs)
2025-12-04T09:59:13.7205459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7205545Z     with policy():
2025-12-04T09:59:13.7206030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7206128Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7207371Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984.
2025-12-04T09:59:13.7207378Z 
2025-12-04T09:59:13.7207572Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7208341Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7208373Z 
2025-12-04T09:59:13.7208614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7208619Z 
2025-12-04T09:59:13.7208767Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7208884Z Traceback (most recent call last):
2025-12-04T09:59:13.7209365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7209473Z     getattr(self, test_name)()
2025-12-04T09:59:13.7209942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7210046Z     fn()
2025-12-04T09:59:13.7210505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7210597Z     method(*args, **kwargs)
2025-12-04T09:59:13.7211052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7211156Z     method(*args, **kwargs)
2025-12-04T09:59:13.7211606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7211699Z     with policy():
2025-12-04T09:59:13.7212147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7212246Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7213476Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984.
2025-12-04T09:59:13.7213485Z 
2025-12-04T09:59:13.7213674Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7214437Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7214466Z 
2025-12-04T09:59:13.7214703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7214708Z 
2025-12-04T09:59:13.7214711Z 
2025-12-04T09:59:13.7214920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7215151Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7215863Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml -
2025-12-04T09:59:13.7216031Z =========================== short test summary info ============================
2025-12-04T09:59:13.7217272Z FAILED [10.2026s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7217408Z Traceback (most recent call last):
2025-12-04T09:59:13.7217966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7218079Z     getattr(self, test_name)()
2025-12-04T09:59:13.7218626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7218713Z     fn()
2025-12-04T09:59:13.7219230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7219336Z     method(*args, **kwargs)
2025-12-04T09:59:13.7219840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7219981Z     method(*args, **kwargs)
2025-12-04T09:59:13.7220483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7220583Z     with policy():
2025-12-04T09:59:13.7221340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7221455Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7222868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888.
2025-12-04T09:59:13.7222947Z 
2025-12-04T09:59:13.7223165Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7224036Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7224042Z 
2025-12-04T09:59:13.7224309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7224314Z 
2025-12-04T09:59:13.7224478Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7224608Z Traceback (most recent call last):
2025-12-04T09:59:13.7225156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7225277Z     getattr(self, test_name)()
2025-12-04T09:59:13.7225820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7225910Z     fn()
2025-12-04T09:59:13.7226420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7226528Z     method(*args, **kwargs)
2025-12-04T09:59:13.7227036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7227185Z     method(*args, **kwargs)
2025-12-04T09:59:13.7227696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7227797Z     with policy():
2025-12-04T09:59:13.7228302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7228413Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7229813Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984.
2025-12-04T09:59:13.7229821Z 
2025-12-04T09:59:13.7230037Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7230941Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7230947Z 
2025-12-04T09:59:13.7231214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7231220Z 
2025-12-04T09:59:13.7231390Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7231522Z Traceback (most recent call last):
2025-12-04T09:59:13.7232068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7232189Z     getattr(self, test_name)()
2025-12-04T09:59:13.7232722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7232983Z     fn()
2025-12-04T09:59:13.7233483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7233585Z     method(*args, **kwargs)
2025-12-04T09:59:13.7234062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7234169Z     method(*args, **kwargs)
2025-12-04T09:59:13.7234641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7234741Z     with policy():
2025-12-04T09:59:13.7235253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7235355Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7236676Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984.
2025-12-04T09:59:13.7236685Z 
2025-12-04T09:59:13.7236889Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7237703Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7237709Z 
2025-12-04T09:59:13.7237960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7238141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7238314Z ====================== 1 failed, 26 deselected in 10.42s =======================
2025-12-04T09:59:13.7238409Z Got exit code 1
2025-12-04T09:59:13.7239153Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda
2025-12-04T09:59:13.7239569Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.7240154Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml
2025-12-04T09:59:13.7240316Z ============================= test session starts ==============================
2025-12-04T09:59:13.7240645Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7240758Z cachedir: .pytest_cache
2025-12-04T09:59:13.7241244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7241359Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7241473Z configfile: pytest.ini
2025-12-04T09:59:13.7242009Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7242223Z collecting ... collected 60 items / 17 deselected / 43 selected
2025-12-04T09:59:13.7242352Z stepcurrent: skipping 17 already run items.
2025-12-04T09:59:13.7242456Z Running 10 items in this shard
2025-12-04T09:59:13.7242461Z 
2025-12-04T09:59:13.7243613Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:23.183000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71551
2025-12-04T09:59:13.7244083Z I1204 09:51:23.184000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71552
2025-12-04T09:59:13.7244663Z I1204 09:51:23.185000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71553
2025-12-04T09:59:13.7245135Z I1204 09:51:23.186000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71554
2025-12-04T09:59:13.7246938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7247067Z   _warn_cpu_init()
2025-12-04T09:59:13.7248854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7248954Z   _warn_cpu_init()
2025-12-04T09:59:13.7250727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7250825Z   _warn_cpu_init()
2025-12-04T09:59:13.7252630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7252727Z   _warn_cpu_init()
2025-12-04T09:59:13.7253616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7253713Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7254132Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7254610Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7255535Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7255988Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7257156Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7257558Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7258525Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7259061Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7260026Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7260526Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7261489Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7261977Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7262945Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7263435Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7265266Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160.
2025-12-04T09:59:13.7265633Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7266302Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7267625Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7268005Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7268837Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7269451Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7269864Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7270366Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7271264Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7271715Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7272601Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7272958Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7273836Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7274279Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7275130Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7275594Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7276441Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7276846Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7277705Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7278142Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7279765Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064.
2025-12-04T09:59:13.7280091Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7280707Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7281851Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7282189Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7282824Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7283344Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7283756Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7284223Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7285119Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7285572Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7286483Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7286837Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7287685Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7288128Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7289001Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7289441Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7290292Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7290697Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7291550Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7291988Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7293639Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.7293962Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7294552Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7295698Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7296030Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7296953Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7297505Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7297969Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7298501Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7299506Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7300044Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7301043Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7301440Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7302397Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7302921Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7303886Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7304388Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7305357Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7305802Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7306770Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7307266Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7309322Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.7309646Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7310232Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7311408Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7311736Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7312375Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7312859Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7312960Z dist init r=0, world=4
2025-12-04T09:59:13.7313047Z dist init r=2, world=4
2025-12-04T09:59:13.7313131Z dist init r=3, world=4
2025-12-04T09:59:13.7313224Z dist init r=1, world=4
2025-12-04T09:59:13.7314280Z [rank0]:[W1204 09:51:31.909434978 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7314369Z FAILED [9.2841s] [ 10%]
2025-12-04T09:59:13.7314381Z 
2025-12-04T09:59:13.7314511Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7314916Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _
2025-12-04T09:59:13.7315029Z Traceback (most recent call last):
2025-12-04T09:59:13.7315512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7315641Z     self._join_processes(fn)
2025-12-04T09:59:13.7316168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7316293Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7316841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7316943Z     raise RuntimeError(error)
2025-12-04T09:59:13.7317152Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7317267Z Traceback (most recent call last):
2025-12-04T09:59:13.7317751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7317847Z     getattr(self, test_name)()
2025-12-04T09:59:13.7318332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7318409Z     fn()
2025-12-04T09:59:13.7318869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7318963Z     method(*args, **kwargs)
2025-12-04T09:59:13.7319410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7319538Z     method(*args, **kwargs)
2025-12-04T09:59:13.7319986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7320073Z     with policy():
2025-12-04T09:59:13.7320530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7320625Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7322274Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160.
2025-12-04T09:59:13.7322285Z 
2025-12-04T09:59:13.7322499Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7323415Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7323421Z 
2025-12-04T09:59:13.7323684Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7323690Z 
2025-12-04T09:59:13.7323695Z 
2025-12-04T09:59:13.7323912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7324182Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7324977Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml -
2025-12-04T09:59:13.7325613Z =========================== short test summary info ============================
2025-12-04T09:59:13.7326616Z FAILED [9.2841s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7326733Z Traceback (most recent call last):
2025-12-04T09:59:13.7327292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7327399Z     getattr(self, test_name)()
2025-12-04T09:59:13.7327949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7328077Z     fn()
2025-12-04T09:59:13.7328583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7328695Z     method(*args, **kwargs)
2025-12-04T09:59:13.7329198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7329301Z     method(*args, **kwargs)
2025-12-04T09:59:13.7329812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7329905Z     with policy():
2025-12-04T09:59:13.7330422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7330529Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7331897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160.
2025-12-04T09:59:13.7331915Z 
2025-12-04T09:59:13.7332126Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7332970Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7333030Z 
2025-12-04T09:59:13.7333304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7333479Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7333761Z ======================= 1 failed, 17 deselected in 9.50s =======================
2025-12-04T09:59:13.7333859Z Got exit code 1
2025-12-04T09:59:13.7333955Z Retrying single test...
2025-12-04T09:59:13.7334551Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml
2025-12-04T09:59:13.7334699Z ============================= test session starts ==============================
2025-12-04T09:59:13.7335026Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7335135Z cachedir: .pytest_cache
2025-12-04T09:59:13.7335651Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7335772Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7335870Z configfile: pytest.ini
2025-12-04T09:59:13.7336450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7336831Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.7337762Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7337908Z Running 1 items in this shard
2025-12-04T09:59:13.7337914Z 
2025-12-04T09:59:13.7339134Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:37.414000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71888
2025-12-04T09:59:13.7339629Z I1204 09:51:37.415000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71889
2025-12-04T09:59:13.7340125Z I1204 09:51:37.416000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71890
2025-12-04T09:59:13.7340610Z I1204 09:51:37.417000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71891
2025-12-04T09:59:13.7342686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7342786Z   _warn_cpu_init()
2025-12-04T09:59:13.7344820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7344920Z   _warn_cpu_init()
2025-12-04T09:59:13.7346954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7347061Z   _warn_cpu_init()
2025-12-04T09:59:13.7349164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7349274Z   _warn_cpu_init()
2025-12-04T09:59:13.7350246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7350399Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7350847Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7351365Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7352533Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7352989Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7353908Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7354261Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7355122Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7355559Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7356437Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7357055Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7357961Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7358388Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7359294Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7359764Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7361510Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.7361863Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7362479Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7363703Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7364057Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7364825Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7365348Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7365772Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7366278Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7367227Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7367733Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7368771Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7369122Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7369982Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7370441Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7371296Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7371737Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7372585Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7372986Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7373842Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7374288Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7375926Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 718209024 and is now 760152064.
2025-12-04T09:59:13.7376260Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7377127Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7378466Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7378840Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7379553Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7380110Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7380556Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7381090Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7382126Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7382628Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7383625Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7384062Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7385038Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7385527Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7386485Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7386976Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7387932Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7388384Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7389542Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7389984Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7391590Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.7391923Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7392533Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7393677Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7394010Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7394644Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7395136Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7395561Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7396038Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7396923Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7397373Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7398280Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7398635Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7399493Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7399923Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7400775Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7401206Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7402061Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7402484Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7403338Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7403775Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7405411Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.7405741Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7406323Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7407469Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7407803Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7408465Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7408955Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7409043Z dist init r=0, world=4
2025-12-04T09:59:13.7409128Z dist init r=1, world=4
2025-12-04T09:59:13.7409217Z dist init r=2, world=4
2025-12-04T09:59:13.7409300Z dist init r=3, world=4
2025-12-04T09:59:13.7410344Z [rank0]:[W1204 09:51:45.298126885 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7410459Z FAILED [9.3877s] [100%]
2025-12-04T09:59:13.7410465Z 
2025-12-04T09:59:13.7410597Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7411014Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _
2025-12-04T09:59:13.7411122Z Traceback (most recent call last):
2025-12-04T09:59:13.7411610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7411708Z     self._join_processes(fn)
2025-12-04T09:59:13.7412222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7412356Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7412888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7412993Z     raise RuntimeError(error)
2025-12-04T09:59:13.7413208Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7413314Z Traceback (most recent call last):
2025-12-04T09:59:13.7413801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7413898Z     getattr(self, test_name)()
2025-12-04T09:59:13.7414398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7414486Z     fn()
2025-12-04T09:59:13.7414933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7415023Z     method(*args, **kwargs)
2025-12-04T09:59:13.7415473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7415567Z     method(*args, **kwargs)
2025-12-04T09:59:13.7416019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7416108Z     with policy():
2025-12-04T09:59:13.7416666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7416955Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7418333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.7418340Z 
2025-12-04T09:59:13.7418557Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7419405Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7419442Z 
2025-12-04T09:59:13.7419709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7419723Z 
2025-12-04T09:59:13.7419727Z 
2025-12-04T09:59:13.7419951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7420217Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7421244Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml -
2025-12-04T09:59:13.7421420Z =========================== short test summary info ============================
2025-12-04T09:59:13.7422435Z FAILED [9.3877s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7422627Z Traceback (most recent call last):
2025-12-04T09:59:13.7423179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7423297Z     getattr(self, test_name)()
2025-12-04T09:59:13.7423843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7423928Z     fn()
2025-12-04T09:59:13.7424447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7424549Z     method(*args, **kwargs)
2025-12-04T09:59:13.7425055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7425160Z     method(*args, **kwargs)
2025-12-04T09:59:13.7425659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7425764Z     with policy():
2025-12-04T09:59:13.7426272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7426378Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7427800Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.7427807Z 
2025-12-04T09:59:13.7428021Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7428870Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7428878Z 
2025-12-04T09:59:13.7429139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7429331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7429540Z ======================= 1 failed, 26 deselected in 9.60s =======================
2025-12-04T09:59:13.7429634Z Got exit code 1
2025-12-04T09:59:13.7429748Z Retrying single test...
2025-12-04T09:59:13.7430369Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml
2025-12-04T09:59:13.7430535Z ============================= test session starts ==============================
2025-12-04T09:59:13.7430878Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7430986Z cachedir: .pytest_cache
2025-12-04T09:59:13.7431510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7431627Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7431788Z configfile: pytest.ini
2025-12-04T09:59:13.7432326Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7432655Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.7433528Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7433635Z Running 1 items in this shard
2025-12-04T09:59:13.7433639Z 
2025-12-04T09:59:13.7434771Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:51.794000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72225
2025-12-04T09:59:13.7435279Z I1204 09:51:51.795000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72226
2025-12-04T09:59:13.7435741Z I1204 09:51:51.796000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72227
2025-12-04T09:59:13.7436209Z I1204 09:51:51.796000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72228
2025-12-04T09:59:13.7438106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7438206Z   _warn_cpu_init()
2025-12-04T09:59:13.7440116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7440216Z   _warn_cpu_init()
2025-12-04T09:59:13.7442105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7442199Z   _warn_cpu_init()
2025-12-04T09:59:13.7444189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7444277Z   _warn_cpu_init()
2025-12-04T09:59:13.7445167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7445264Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7445682Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7446179Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7447070Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7447533Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7448408Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7448796Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7449643Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7450084Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7450937Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7451366Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7452220Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7452621Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7453506Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7453946Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7455570Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.7455900Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7456599Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7458069Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7458437Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7459163Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7459711Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7460214Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7460745Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7461750Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7462260Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7463273Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7463682Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7464647Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7465140Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7466096Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7466582Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7467548Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7468020Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7469098Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7469559Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7471315Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.7471657Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7472276Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7473506Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7473845Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7474551Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7475179Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7475592Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7476060Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7476950Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7477442Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7478319Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7478681Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7479534Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7479971Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7480821Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7481253Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7482135Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7482532Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7483390Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7483826Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7485495Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.7485820Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7486405Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7487558Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7487917Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7488562Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7489044Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7489452Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7489945Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7490829Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7491292Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7492160Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7492517Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7493372Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7493814Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7494691Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7495121Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7495974Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7496435Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7497546Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7498082Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7499913Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.7500278Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7500942Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7502273Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7502636Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7503355Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7503927Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7504034Z dist init r=1, world=4
2025-12-04T09:59:13.7504135Z dist init r=2, world=4
2025-12-04T09:59:13.7504230Z dist init r=0, world=4
2025-12-04T09:59:13.7504336Z dist init r=3, world=4
2025-12-04T09:59:13.7505496Z [rank0]:[W1204 09:51:59.469554411 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7505598Z FAILED [10.2161s] [100%]
2025-12-04T09:59:13.7505609Z 
2025-12-04T09:59:13.7505757Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7506212Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _
2025-12-04T09:59:13.7506340Z Traceback (most recent call last):
2025-12-04T09:59:13.7506883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7506996Z     self._join_processes(fn)
2025-12-04T09:59:13.7507589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7507732Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7508379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7508492Z     raise RuntimeError(error)
2025-12-04T09:59:13.7508839Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7508969Z Traceback (most recent call last):
2025-12-04T09:59:13.7509572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7509673Z     getattr(self, test_name)()
2025-12-04T09:59:13.7510160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7510241Z     fn()
2025-12-04T09:59:13.7510695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7510818Z     method(*args, **kwargs)
2025-12-04T09:59:13.7511266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7511366Z     method(*args, **kwargs)
2025-12-04T09:59:13.7511808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7511894Z     with policy():
2025-12-04T09:59:13.7512355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7512454Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7513682Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.7513716Z 
2025-12-04T09:59:13.7513906Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7514670Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7514675Z 
2025-12-04T09:59:13.7514912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7514917Z 
2025-12-04T09:59:13.7515060Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7515204Z Traceback (most recent call last):
2025-12-04T09:59:13.7515691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7515804Z     getattr(self, test_name)()
2025-12-04T09:59:13.7516279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7516360Z     fn()
2025-12-04T09:59:13.7516821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7516915Z     method(*args, **kwargs)
2025-12-04T09:59:13.7517363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7517460Z     method(*args, **kwargs)
2025-12-04T09:59:13.7517905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7518001Z     with policy():
2025-12-04T09:59:13.7518449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7518544Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7519801Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.7519807Z 
2025-12-04T09:59:13.7519999Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7520886Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7520897Z 
2025-12-04T09:59:13.7521144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7521148Z 
2025-12-04T09:59:13.7521485Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7521619Z Traceback (most recent call last):
2025-12-04T09:59:13.7522182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7522361Z     getattr(self, test_name)()
2025-12-04T09:59:13.7522901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7522989Z     fn()
2025-12-04T09:59:13.7523503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7523608Z     method(*args, **kwargs)
2025-12-04T09:59:13.7524118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7524223Z     method(*args, **kwargs)
2025-12-04T09:59:13.7524720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7524864Z     with policy():
2025-12-04T09:59:13.7525368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7525481Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7526869Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.7526875Z 
2025-12-04T09:59:13.7527085Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7527966Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7527972Z 
2025-12-04T09:59:13.7528234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7528242Z 
2025-12-04T09:59:13.7528246Z 
2025-12-04T09:59:13.7528480Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7528745Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7529547Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml -
2025-12-04T09:59:13.7529729Z =========================== short test summary info ============================
2025-12-04T09:59:13.7530731Z FAILED [10.2161s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7530865Z Traceback (most recent call last):
2025-12-04T09:59:13.7531412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7531526Z     getattr(self, test_name)()
2025-12-04T09:59:13.7532070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7532206Z     fn()
2025-12-04T09:59:13.7532721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7532829Z     method(*args, **kwargs)
2025-12-04T09:59:13.7533334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7533440Z     method(*args, **kwargs)
2025-12-04T09:59:13.7534011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7534099Z     with policy():
2025-12-04T09:59:13.7534556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7534657Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7535916Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.7535921Z 
2025-12-04T09:59:13.7536109Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7537113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7537133Z 
2025-12-04T09:59:13.7537399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7537442Z 
2025-12-04T09:59:13.7537606Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7537733Z Traceback (most recent call last):
2025-12-04T09:59:13.7538284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7538395Z     getattr(self, test_name)()
2025-12-04T09:59:13.7538939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7539029Z     fn()
2025-12-04T09:59:13.7539542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7539645Z     method(*args, **kwargs)
2025-12-04T09:59:13.7540177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7540288Z     method(*args, **kwargs)
2025-12-04T09:59:13.7540789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7540887Z     with policy():
2025-12-04T09:59:13.7541403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7541514Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7542893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.7542899Z 
2025-12-04T09:59:13.7543116Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7543963Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7543971Z 
2025-12-04T09:59:13.7544236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7544241Z 
2025-12-04T09:59:13.7544405Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7544563Z Traceback (most recent call last):
2025-12-04T09:59:13.7545109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7545226Z     getattr(self, test_name)()
2025-12-04T09:59:13.7545759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7545846Z     fn()
2025-12-04T09:59:13.7546365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7546467Z     method(*args, **kwargs)
2025-12-04T09:59:13.7546970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7547082Z     method(*args, **kwargs)
2025-12-04T09:59:13.7547615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7547720Z     with policy():
2025-12-04T09:59:13.7548229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7548339Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7549745Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.7549754Z 
2025-12-04T09:59:13.7549949Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7550731Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7550736Z 
2025-12-04T09:59:13.7550969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7551130Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7551301Z ====================== 1 failed, 26 deselected in 10.44s =======================
2025-12-04T09:59:13.7551384Z Got exit code 1
2025-12-04T09:59:13.7552066Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda
2025-12-04T09:59:13.7552454Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.7553001Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml
2025-12-04T09:59:13.7553153Z ============================= test session starts ==============================
2025-12-04T09:59:13.7553467Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7553575Z cachedir: .pytest_cache
2025-12-04T09:59:13.7554027Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7554134Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7554236Z configfile: pytest.ini
2025-12-04T09:59:13.7554713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7554905Z collecting ... collected 60 items / 18 deselected / 42 selected
2025-12-04T09:59:13.7555032Z stepcurrent: skipping 18 already run items.
2025-12-04T09:59:13.7555133Z Running 9 items in this shard
2025-12-04T09:59:13.7555137Z 
2025-12-04T09:59:13.7556457Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:06.254000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72562
2025-12-04T09:59:13.7556930Z I1204 09:52:06.255000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72563
2025-12-04T09:59:13.7557397Z I1204 09:52:06.255000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72564
2025-12-04T09:59:13.7557862Z I1204 09:52:06.256000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72565
2025-12-04T09:59:13.7558807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7558947Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7559903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7560037Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7561935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7562033Z   _warn_cpu_init()
2025-12-04T09:59:13.7563972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7564064Z   _warn_cpu_init()
2025-12-04T09:59:13.7565004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7565152Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7567046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7567139Z   _warn_cpu_init()
2025-12-04T09:59:13.7568084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7568293Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7569231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7569549Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7570525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7570649Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7572430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7572522Z   _warn_cpu_init()
2025-12-04T09:59:13.7573406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7573627Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7574520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7574711Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7575600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7575701Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7576477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7576617Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7577553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7577672Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7578434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7578538Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7579357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7579464Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7580234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7580342Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7581096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7581214Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7581963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7582071Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7582837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7582944Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7583413Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7583976Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7584985Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7585499Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7586489Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7586897Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7587893Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7588391Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7589409Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7589849Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7590757Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7591160Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7592022Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7592457Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7594133Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.7594465Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7595062Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7596219Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7596545Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7597194Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7597708Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7598118Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7598588Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7599478Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7599939Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7600838Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7601209Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7602058Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7602498Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7603343Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7603803Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7604658Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7605049Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7605914Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7606374Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7608017Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 0. CUDA driver allocated memory was 720306176 and is now 737083392.
2025-12-04T09:59:13.7608339Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7608927Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7610084Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7610407Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7611081Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7611563Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7611972Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7612446Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7613339Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7613818Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7614700Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7615054Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7615904Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7616441Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7617552Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7618054Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7619009Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7619494Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7620474Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7621166Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7623007Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7623368Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7624035Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7625413Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7625778Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7626504Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7627051Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7627511Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7628039Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7629090Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7629597Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7630585Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7630994Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7632009Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7632619Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7633608Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7634053Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7634950Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7635345Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7636209Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7636644Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7638265Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488.
2025-12-04T09:59:13.7638592Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7639212Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7640375Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7640700Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7641342Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7641827Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7641931Z dist init r=2, world=4
2025-12-04T09:59:13.7642044Z dist init r=3, world=4
2025-12-04T09:59:13.7642132Z dist init r=1, world=4
2025-12-04T09:59:13.7642220Z dist init r=0, world=4
2025-12-04T09:59:13.7643245Z [rank0]:[W1204 09:52:14.005460998 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7643340Z FAILED [10.0260s] [ 11%]
2025-12-04T09:59:13.7643348Z 
2025-12-04T09:59:13.7643475Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7643893Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _
2025-12-04T09:59:13.7644037Z Traceback (most recent call last):
2025-12-04T09:59:13.7644523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7644629Z     self._join_processes(fn)
2025-12-04T09:59:13.7645148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7645449Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7646029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7646135Z     raise RuntimeError(error)
2025-12-04T09:59:13.7646391Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7646513Z Traceback (most recent call last):
2025-12-04T09:59:13.7647020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7647134Z     getattr(self, test_name)()
2025-12-04T09:59:13.7647641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7647727Z     fn()
2025-12-04T09:59:13.7648214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7648313Z     method(*args, **kwargs)
2025-12-04T09:59:13.7648788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7648889Z     method(*args, **kwargs)
2025-12-04T09:59:13.7649363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7649461Z     with policy():
2025-12-04T09:59:13.7649939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7650041Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7651376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7651383Z 
2025-12-04T09:59:13.7651588Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7652394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7652402Z 
2025-12-04T09:59:13.7652651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7652656Z 
2025-12-04T09:59:13.7652815Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7652931Z Traceback (most recent call last):
2025-12-04T09:59:13.7653476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7653586Z     getattr(self, test_name)()
2025-12-04T09:59:13.7654093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7654177Z     fn()
2025-12-04T09:59:13.7654662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7654763Z     method(*args, **kwargs)
2025-12-04T09:59:13.7655240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7655339Z     method(*args, **kwargs)
2025-12-04T09:59:13.7655808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7655933Z     with policy():
2025-12-04T09:59:13.7656493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7656598Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7658151Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488.
2025-12-04T09:59:13.7658158Z 
2025-12-04T09:59:13.7658375Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7659277Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7659285Z 
2025-12-04T09:59:13.7659549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7659555Z 
2025-12-04T09:59:13.7659560Z 
2025-12-04T09:59:13.7659787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7660047Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7660848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml -
2025-12-04T09:59:13.7661022Z =========================== short test summary info ============================
2025-12-04T09:59:13.7662032Z FAILED [10.0260s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7662164Z Traceback (most recent call last):
2025-12-04T09:59:13.7662712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7662824Z     getattr(self, test_name)()
2025-12-04T09:59:13.7663396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7663485Z     fn()
2025-12-04T09:59:13.7663998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7664099Z     method(*args, **kwargs)
2025-12-04T09:59:13.7664607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7664725Z     method(*args, **kwargs)
2025-12-04T09:59:13.7665225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7665319Z     with policy():
2025-12-04T09:59:13.7665833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7666292Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7667678Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7667684Z 
2025-12-04T09:59:13.7667896Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7668845Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7668862Z 
2025-12-04T09:59:13.7669096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7669133Z 
2025-12-04T09:59:13.7669277Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7669396Z Traceback (most recent call last):
2025-12-04T09:59:13.7669880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7669974Z     getattr(self, test_name)()
2025-12-04T09:59:13.7670453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7670531Z     fn()
2025-12-04T09:59:13.7670985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7671102Z     method(*args, **kwargs)
2025-12-04T09:59:13.7671551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7671648Z     method(*args, **kwargs)
2025-12-04T09:59:13.7672093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7672180Z     with policy():
2025-12-04T09:59:13.7672640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7672736Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7673961Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488.
2025-12-04T09:59:13.7673968Z 
2025-12-04T09:59:13.7674159Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7674923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7674930Z 
2025-12-04T09:59:13.7675168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7675355Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7675521Z ====================== 1 failed, 18 deselected in 10.24s =======================
2025-12-04T09:59:13.7675608Z Got exit code 1
2025-12-04T09:59:13.7675700Z Retrying single test...
2025-12-04T09:59:13.7676259Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml
2025-12-04T09:59:13.7676404Z ============================= test session starts ==============================
2025-12-04T09:59:13.7676717Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7676810Z cachedir: .pytest_cache
2025-12-04T09:59:13.7677264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7677378Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7677509Z configfile: pytest.ini
2025-12-04T09:59:13.7677991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7678185Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.7679023Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7679132Z Running 1 items in this shard
2025-12-04T09:59:13.7679136Z 
2025-12-04T09:59:13.7680447Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:20.724000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72899
2025-12-04T09:59:13.7680962Z I1204 09:52:20.725000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72900
2025-12-04T09:59:13.7681426Z I1204 09:52:20.725000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72901
2025-12-04T09:59:13.7681889Z I1204 09:52:20.726000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72902
2025-12-04T09:59:13.7682831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7682996Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7684916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7685011Z   _warn_cpu_init()
2025-12-04T09:59:13.7685954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7686081Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7687977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7688084Z   _warn_cpu_init()
2025-12-04T09:59:13.7689045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7689178Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7690110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7690331Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7692358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7692462Z   _warn_cpu_init()
2025-12-04T09:59:13.7693341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7693542Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7694435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7694563Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7695446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7695566Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7697712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7697864Z   _warn_cpu_init()
2025-12-04T09:59:13.7698861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7699095Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7700090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7700320Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7701093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7701207Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7701981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7702093Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7702884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7703004Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7703766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7703881Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7704640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7704751Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7705548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7705657Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7706422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7706530Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7707289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7707408Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7707868Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7708416Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7709507Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7709963Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7710851Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7711236Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7712095Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7712531Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7713391Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7713822Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7714684Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7715092Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7715971Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7716417Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7718038Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488.
2025-12-04T09:59:13.7718376Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7718988Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7720164Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7720488Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7721475Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7722098Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7722620Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7723163Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7724164Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7724671Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7725707Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7726104Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7727079Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7727568Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7728531Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7729022Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7729980Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7730485Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7731459Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7731957Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7733922Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.7734262Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7734845Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7736009Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7736402Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7737299Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7737867Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7738321Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7738866Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7739868Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7740424Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7741419Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7741820Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7742792Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7743283Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7744248Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7744738Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7745724Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7746178Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7747137Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7747640Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7749543Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7749879Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7750459Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7751627Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7751980Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7752617Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7753110Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7753504Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7754008Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7755089Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7755582Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7756509Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7756884Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7757795Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7758252Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7759196Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7759653Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7760563Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7760986Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7761889Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7762391Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7764325Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.7764694Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7765328Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7766638Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7766991Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7767684Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7768248Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7768344Z dist init r=1, world=4
2025-12-04T09:59:13.7768452Z dist init r=0, world=4
2025-12-04T09:59:13.7768543Z dist init r=2, world=4
2025-12-04T09:59:13.7768638Z dist init r=3, world=4
2025-12-04T09:59:13.7769770Z [rank0]:[W1204 09:52:28.482492998 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7769868Z FAILED [10.2471s] [100%]
2025-12-04T09:59:13.7769874Z 
2025-12-04T09:59:13.7770025Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7770479Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _
2025-12-04T09:59:13.7770599Z Traceback (most recent call last):
2025-12-04T09:59:13.7771140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7771251Z     self._join_processes(fn)
2025-12-04T09:59:13.7771813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7771959Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7772587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7772705Z     raise RuntimeError(error)
2025-12-04T09:59:13.7772933Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7773047Z Traceback (most recent call last):
2025-12-04T09:59:13.7773585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7773692Z     getattr(self, test_name)()
2025-12-04T09:59:13.7774210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7774305Z     fn()
2025-12-04T09:59:13.7774796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7774937Z     method(*args, **kwargs)
2025-12-04T09:59:13.7775431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7775530Z     method(*args, **kwargs)
2025-12-04T09:59:13.7776029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7776122Z     with policy():
2025-12-04T09:59:13.7776896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7777015Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7778396Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.7778445Z 
2025-12-04T09:59:13.7778671Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7779522Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7779528Z 
2025-12-04T09:59:13.7779803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7779808Z 
2025-12-04T09:59:13.7780081Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7780200Z Traceback (most recent call last):
2025-12-04T09:59:13.7780756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7780870Z     getattr(self, test_name)()
2025-12-04T09:59:13.7781417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7781507Z     fn()
2025-12-04T09:59:13.7782018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7782129Z     method(*args, **kwargs)
2025-12-04T09:59:13.7782630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7782733Z     method(*args, **kwargs)
2025-12-04T09:59:13.7783242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7783343Z     with policy():
2025-12-04T09:59:13.7783858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7783966Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7785376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488.
2025-12-04T09:59:13.7785391Z 
2025-12-04T09:59:13.7785605Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7786454Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7786462Z 
2025-12-04T09:59:13.7786737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7786742Z 
2025-12-04T09:59:13.7786905Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7787038Z Traceback (most recent call last):
2025-12-04T09:59:13.7787592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7787730Z     getattr(self, test_name)()
2025-12-04T09:59:13.7788424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7788511Z     fn()
2025-12-04T09:59:13.7789003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7789116Z     method(*args, **kwargs)
2025-12-04T09:59:13.7789607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7789720Z     method(*args, **kwargs)
2025-12-04T09:59:13.7790209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7790335Z     with policy():
2025-12-04T09:59:13.7790836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7790940Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7792375Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7792391Z 
2025-12-04T09:59:13.7792595Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7793425Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7793433Z 
2025-12-04T09:59:13.7793689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7793694Z 
2025-12-04T09:59:13.7793699Z 
2025-12-04T09:59:13.7793909Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7794163Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7794916Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml -
2025-12-04T09:59:13.7795077Z =========================== short test summary info ============================
2025-12-04T09:59:13.7796046Z FAILED [10.2471s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7796161Z Traceback (most recent call last):
2025-12-04T09:59:13.7796685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7796792Z     getattr(self, test_name)()
2025-12-04T09:59:13.7797327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7797422Z     fn()
2025-12-04T09:59:13.7797994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7798098Z     method(*args, **kwargs)
2025-12-04T09:59:13.7798547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7798643Z     method(*args, **kwargs)
2025-12-04T09:59:13.7799099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7799184Z     with policy():
2025-12-04T09:59:13.7799637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7799748Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7801002Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.7801008Z 
2025-12-04T09:59:13.7801206Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7801963Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7801970Z 
2025-12-04T09:59:13.7802220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7802249Z 
2025-12-04T09:59:13.7802390Z Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.7802500Z Traceback (most recent call last):
2025-12-04T09:59:13.7802994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7803089Z     getattr(self, test_name)()
2025-12-04T09:59:13.7803563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7803648Z     fn()
2025-12-04T09:59:13.7804097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7804224Z     method(*args, **kwargs)
2025-12-04T09:59:13.7804668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7804759Z     method(*args, **kwargs)
2025-12-04T09:59:13.7805215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7805301Z     with policy():
2025-12-04T09:59:13.7805761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7805856Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7807075Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488.
2025-12-04T09:59:13.7807081Z 
2025-12-04T09:59:13.7807280Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7808035Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7808041Z 
2025-12-04T09:59:13.7808287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7808293Z 
2025-12-04T09:59:13.7808464Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.7808574Z Traceback (most recent call last):
2025-12-04T09:59:13.7809064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7809162Z     getattr(self, test_name)()
2025-12-04T09:59:13.7809645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7809726Z     fn()
2025-12-04T09:59:13.7810172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7810271Z     method(*args, **kwargs)
2025-12-04T09:59:13.7810719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7810809Z     method(*args, **kwargs)
2025-12-04T09:59:13.7811307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7811395Z     with policy():
2025-12-04T09:59:13.7811853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7811949Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7813168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7813184Z 
2025-12-04T09:59:13.7813402Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7814160Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7814165Z 
2025-12-04T09:59:13.7814408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7814568Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7814728Z ====================== 1 failed, 26 deselected in 10.46s =======================
2025-12-04T09:59:13.7814821Z Got exit code 1
2025-12-04T09:59:13.7814915Z Retrying single test...
2025-12-04T09:59:13.7815500Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml
2025-12-04T09:59:13.7815644Z ============================= test session starts ==============================
2025-12-04T09:59:13.7815956Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7816056Z cachedir: .pytest_cache
2025-12-04T09:59:13.7816596Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7816894Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7817004Z configfile: pytest.ini
2025-12-04T09:59:13.7817538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7817767Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.7818705Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7818825Z Running 1 items in this shard
2025-12-04T09:59:13.7818832Z 
2025-12-04T09:59:13.7820068Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:35.164000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73236
2025-12-04T09:59:13.7820602Z I1204 09:52:35.164000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73237
2025-12-04T09:59:13.7821341Z I1204 09:52:35.165000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73238
2025-12-04T09:59:13.7821839Z I1204 09:52:35.166000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73239
2025-12-04T09:59:13.7822857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7822997Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7825084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7825199Z   _warn_cpu_init()
2025-12-04T09:59:13.7826197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7826427Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7827470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7827613Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7828598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7828727Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7830760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7830904Z   _warn_cpu_init()
2025-12-04T09:59:13.7833037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7833138Z   _warn_cpu_init()
2025-12-04T09:59:13.7834085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7834190Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7835123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7835373Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7836314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7836528Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7837633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7837765Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T09:59:13.7839766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7839863Z   _warn_cpu_init()
2025-12-04T09:59:13.7840834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7841047Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T09:59:13.7844252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7844445Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7845204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7845319Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7846062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7846165Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7846914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7847052Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7847806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7847921Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7848769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7848884Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7849594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7849695Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7850489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7850586Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7851014Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7851485Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7852407Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7852868Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7853747Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7854113Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7854962Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7855412Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7856263Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7857164Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7858201Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7858678Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7859653Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7860142Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7861995Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392.
2025-12-04T09:59:13.7862396Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7863054Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7864370Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7864739Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7865471Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7866019Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.7866508Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7867039Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7868039Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7868551Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7869620Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7870007Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7870910Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7871375Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7872284Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7872817Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7873797Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7874192Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7875053Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7875520Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7877166Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 3. CUDA driver allocated memory was 558825472 and is now 628031488.
2025-12-04T09:59:13.7877488Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7878067Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7879239Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7879567Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7880256Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7880741Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.7881153Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7881622Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7882508Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7882972Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7883842Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7884202Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7885051Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7885498Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7886398Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7886833Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7887688Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7888111Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7888974Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7889412Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7891041Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.7891363Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7891957Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7893143Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7893466Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7894115Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7894596Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.7895009Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7895484Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7896443Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7897105Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7898090Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7898498Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7899501Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7900026Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7900981Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7901465Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7902465Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7902912Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7903887Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7904375Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7906206Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.7906574Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7907242Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7908571Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7909040Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7909806Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7910289Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.7910390Z dist init r=0, world=4
2025-12-04T09:59:13.7910475Z dist init r=1, world=4
2025-12-04T09:59:13.7910557Z dist init r=2, world=4
2025-12-04T09:59:13.7910650Z dist init r=3, world=4
2025-12-04T09:59:13.7911675Z [rank0]:[W1204 09:52:43.936846942 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.7911770Z FAILED [9.4939s] [100%]
2025-12-04T09:59:13.7911776Z 
2025-12-04T09:59:13.7911903Z =================================== FAILURES ===================================
2025-12-04T09:59:13.7912321Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _
2025-12-04T09:59:13.7912433Z Traceback (most recent call last):
2025-12-04T09:59:13.7912977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.7913080Z     self._join_processes(fn)
2025-12-04T09:59:13.7913607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.7913733Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.7914276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.7914377Z     raise RuntimeError(error)
2025-12-04T09:59:13.7914583Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7914720Z Traceback (most recent call last):
2025-12-04T09:59:13.7915200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7915301Z     getattr(self, test_name)()
2025-12-04T09:59:13.7915783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7915861Z     fn()
2025-12-04T09:59:13.7916316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7916409Z     method(*args, **kwargs)
2025-12-04T09:59:13.7916853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7916951Z     method(*args, **kwargs)
2025-12-04T09:59:13.7917393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7917486Z     with policy():
2025-12-04T09:59:13.7917932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7918028Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7919297Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392.
2025-12-04T09:59:13.7919304Z 
2025-12-04T09:59:13.7919493Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7920260Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7920268Z 
2025-12-04T09:59:13.7920499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7920504Z 
2025-12-04T09:59:13.7920508Z 
2025-12-04T09:59:13.7920700Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.7921277Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.7922244Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml -
2025-12-04T09:59:13.7922430Z =========================== short test summary info ============================
2025-12-04T09:59:13.7923441Z FAILED [9.4939s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.7923560Z Traceback (most recent call last):
2025-12-04T09:59:13.7924115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7924223Z     getattr(self, test_name)()
2025-12-04T09:59:13.7924845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7924971Z     fn()
2025-12-04T09:59:13.7925479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7925593Z     method(*args, **kwargs)
2025-12-04T09:59:13.7926096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7926215Z     method(*args, **kwargs)
2025-12-04T09:59:13.7926715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7926815Z     with policy():
2025-12-04T09:59:13.7927384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7927489Z     raise RuntimeError(msg)
2025-12-04T09:59:13.7928880Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392.
2025-12-04T09:59:13.7928899Z 
2025-12-04T09:59:13.7929109Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7929967Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7929972Z 
2025-12-04T09:59:13.7930244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.7930423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.7930610Z ======================= 1 failed, 26 deselected in 9.71s =======================
2025-12-04T09:59:13.7930708Z Got exit code 1
2025-12-04T09:59:13.7931475Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda
2025-12-04T09:59:13.7931924Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.7932543Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml
2025-12-04T09:59:13.7932716Z ============================= test session starts ==============================
2025-12-04T09:59:13.7933064Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.7933173Z cachedir: .pytest_cache
2025-12-04T09:59:13.7933802Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.7934033Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.7934128Z configfile: pytest.ini
2025-12-04T09:59:13.7934612Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.7934804Z collecting ... collected 60 items / 19 deselected / 41 selected
2025-12-04T09:59:13.7934935Z stepcurrent: skipping 19 already run items.
2025-12-04T09:59:13.7935032Z Running 8 items in this shard
2025-12-04T09:59:13.7935037Z 
2025-12-04T09:59:13.7935944Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:52:49.684000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73573
2025-12-04T09:59:13.7936468Z I1204 09:52:49.685000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73574
2025-12-04T09:59:13.7937103Z I1204 09:52:49.685000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73575
2025-12-04T09:59:13.7937679Z I1204 09:52:49.686000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73576
2025-12-04T09:59:13.7938932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.7939061Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.7940304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.7940459Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.7941695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.7941817Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.7943055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.7943175Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.7944138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7944261Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.7946308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7946415Z   _warn_cpu_init()
2025-12-04T09:59:13.7947377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7947491Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.7948460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7948577Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.7950531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7950622Z   _warn_cpu_init()
2025-12-04T09:59:13.7952428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7952543Z   _warn_cpu_init()
2025-12-04T09:59:13.7953434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7953524Z   fsdp_model = FSDP(
2025-12-04T09:59:13.7954370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7954477Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.7956290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.7956384Z   _warn_cpu_init()
2025-12-04T09:59:13.7957275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7957370Z   fsdp_model = FSDP(
2025-12-04T09:59:13.7958252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7958349Z   fsdp_model = FSDP(
2025-12-04T09:59:13.7959226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.7959316Z   fsdp_model = FSDP(
2025-12-04T09:59:13.7960025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7960130Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7960809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7960919Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7961593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7961690Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7962378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.7962473Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7963147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7963240Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7963909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7964010Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7964911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7965050Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7965788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.7965889Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7966833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.7966936Z   return func(*args, **kwargs)
2025-12-04T09:59:13.7971189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.7971591Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.7975854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.7976236Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.7980974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.7981376Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.7985975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.7986403Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.7986861Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.7987408Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.7988415Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.7989042Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.7990038Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.7990400Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.7991277Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7991711Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7992713Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.7993189Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.7994124Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.7994561Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.7995511Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.7996088Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.7997638Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 607059968 and is now 678363136.
2025-12-04T09:59:13.7998030Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.7998698Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.7999728Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8000141Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8000856Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8001420Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8007121Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8007658Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8008576Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8009043Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8010008Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8010368Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8011218Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8011654Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8012519Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8012953Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8013810Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8014206Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8015066Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8015502Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8017420Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040.
2025-12-04T09:59:13.8017792Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8018446Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8019918Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8020289Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8021235Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8021784Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8022235Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8022771Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8023778Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8024296Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8025363Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8025770Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8026730Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8027216Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8028189Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8028672Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8029641Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8030088Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8031112Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8031646Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8033369Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8033693Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8034314Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8035305Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8035628Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8036274Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8036760Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8037159Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8037634Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8038554Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8039012Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8039894Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8040253Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8041108Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8041541Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8042394Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8042824Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8043680Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8044077Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8044994Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8045431Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8046898Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 607059968 and is now 678363136.
2025-12-04T09:59:13.8047255Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8047842Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8048832Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8049152Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8049794Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8050278Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8050372Z dist init r=3, world=4
2025-12-04T09:59:13.8050470Z dist init r=0, world=4
2025-12-04T09:59:13.8050558Z dist init r=1, world=4
2025-12-04T09:59:13.8050644Z dist init r=2, world=4
2025-12-04T09:59:13.8051709Z [rank0]:[W1204 09:53:00.522164021 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8051800Z FAILED [13.1067s] [ 12%]
2025-12-04T09:59:13.8051806Z 
2025-12-04T09:59:13.8051946Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8052215Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______
2025-12-04T09:59:13.8052325Z Traceback (most recent call last):
2025-12-04T09:59:13.8052819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8052922Z     self._join_processes(fn)
2025-12-04T09:59:13.8053453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8053581Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8054121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8054229Z     raise RuntimeError(error)
2025-12-04T09:59:13.8054606Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8054725Z Traceback (most recent call last):
2025-12-04T09:59:13.8055228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8055333Z     getattr(self, test_name)()
2025-12-04T09:59:13.8055839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8055954Z     fn()
2025-12-04T09:59:13.8056552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8056661Z     method(*args, **kwargs)
2025-12-04T09:59:13.8057339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8057454Z     method(*args, **kwargs)
2025-12-04T09:59:13.8057957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8058054Z     with policy():
2025-12-04T09:59:13.8058570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8058712Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8059914Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8059930Z 
2025-12-04T09:59:13.8060146Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8060802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8060808Z 
2025-12-04T09:59:13.8061078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8061084Z 
2025-12-04T09:59:13.8061088Z 
2025-12-04T09:59:13.8061307Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8061580Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8062380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml -
2025-12-04T09:59:13.8062552Z =========================== short test summary info ============================
2025-12-04T09:59:13.8063422Z FAILED [13.1067s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8063547Z Traceback (most recent call last):
2025-12-04T09:59:13.8064106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8064217Z     getattr(self, test_name)()
2025-12-04T09:59:13.8064755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8064853Z     fn()
2025-12-04T09:59:13.8065360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8065466Z     method(*args, **kwargs)
2025-12-04T09:59:13.8065981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8066082Z     method(*args, **kwargs)
2025-12-04T09:59:13.8066594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8066688Z     with policy():
2025-12-04T09:59:13.8067198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8067312Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8068505Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8068545Z 
2025-12-04T09:59:13.8068931Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8069566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8069571Z 
2025-12-04T09:59:13.8069827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8070006Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8070180Z ====================== 1 failed, 19 deselected in 13.32s =======================
2025-12-04T09:59:13.8070281Z Got exit code 1
2025-12-04T09:59:13.8070380Z Retrying single test...
2025-12-04T09:59:13.8071009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml
2025-12-04T09:59:13.8071171Z ============================= test session starts ==============================
2025-12-04T09:59:13.8071511Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8071615Z cachedir: .pytest_cache
2025-12-04T09:59:13.8072122Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8072236Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8072346Z configfile: pytest.ini
2025-12-04T09:59:13.8072864Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8073074Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8073797Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8073909Z Running 1 items in this shard
2025-12-04T09:59:13.8073914Z 
2025-12-04T09:59:13.8074921Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:53:07.504000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73910
2025-12-04T09:59:13.8075429Z I1204 09:53:07.505000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73911
2025-12-04T09:59:13.8076116Z I1204 09:53:07.506000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73912
2025-12-04T09:59:13.8076557Z I1204 09:53:07.506000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73913
2025-12-04T09:59:13.8077665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8077792Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8078891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8079010Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8080100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8080208Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8081494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8081682Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8082608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8082724Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8084624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8084756Z   _warn_cpu_init()
2025-12-04T09:59:13.8085667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8085785Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8087677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8087779Z   _warn_cpu_init()
2025-12-04T09:59:13.8088709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8088809Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8089836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8089932Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8090845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8090954Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8091852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8091967Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8093909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8093995Z   _warn_cpu_init()
2025-12-04T09:59:13.8095779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8095894Z   _warn_cpu_init()
2025-12-04T09:59:13.8097082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8097185Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8098166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8098273Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8099047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8099194Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8099965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8100073Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8100839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8100946Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8101712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8101815Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8102570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8102686Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8103441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8103552Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8104329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8104436Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8105193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8105299Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8106306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8106416Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8110843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8111219Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8115207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8115582Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8119606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8119961Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8124643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8125044Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8125503Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8126053Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8127137Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8127682Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8128686Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8129080Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8130092Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8130585Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8131557Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8132040Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8132993Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8133457Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8134435Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8134919Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8136461Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 607059968 and is now 678363136.
2025-12-04T09:59:13.8136986Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8137650Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8138782Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8139145Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8139867Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8140420Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8140873Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8141480Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8142490Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8142993Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8143994Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8144421Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8145395Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8145883Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8146843Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8147329Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8148295Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8148749Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8149760Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8150205Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8151665Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040.
2025-12-04T09:59:13.8151996Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8152582Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8153573Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8153892Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8154525Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8155019Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8155469Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8155945Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8156831Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8157274Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8158183Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8158537Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8159399Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8159825Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8160678Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8161111Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8161960Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8162393Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8163245Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8163687Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8165154Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8165486Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8166067Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8167047Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8167380Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8168009Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8168553Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8168952Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8169428Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8170309Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8170785Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8171667Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8172019Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8172876Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8173304Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8174164Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8174592Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8175474Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8175878Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8177014Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8177518Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8179176Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 609157120 and is now 678363136.
2025-12-04T09:59:13.8179548Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8180202Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8181304Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8181672Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8182449Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8183003Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8183103Z dist init r=0, world=4
2025-12-04T09:59:13.8183199Z dist init r=2, world=4
2025-12-04T09:59:13.8183302Z dist init r=1, world=4
2025-12-04T09:59:13.8183396Z dist init r=3, world=4
2025-12-04T09:59:13.8184562Z [rank0]:[W1204 09:53:18.900793628 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8184692Z FAILED [12.7750s] [100%]
2025-12-04T09:59:13.8184700Z 
2025-12-04T09:59:13.8184850Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8185160Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______
2025-12-04T09:59:13.8185277Z Traceback (most recent call last):
2025-12-04T09:59:13.8185829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8185938Z     self._join_processes(fn)
2025-12-04T09:59:13.8186520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8186664Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8187270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8187385Z     raise RuntimeError(error)
2025-12-04T09:59:13.8187622Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8187740Z Traceback (most recent call last):
2025-12-04T09:59:13.8188284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8188422Z     getattr(self, test_name)()
2025-12-04T09:59:13.8189067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8189162Z     fn()
2025-12-04T09:59:13.8189637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8189732Z     method(*args, **kwargs)
2025-12-04T09:59:13.8190215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8190312Z     method(*args, **kwargs)
2025-12-04T09:59:13.8190790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8190882Z     with policy():
2025-12-04T09:59:13.8191359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8191470Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8192593Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8192599Z 
2025-12-04T09:59:13.8192812Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8193427Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8193432Z 
2025-12-04T09:59:13.8193739Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8193751Z 
2025-12-04T09:59:13.8193756Z 
2025-12-04T09:59:13.8193963Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8194317Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8195045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml -
2025-12-04T09:59:13.8195194Z =========================== short test summary info ============================
2025-12-04T09:59:13.8195929Z FAILED [12.7750s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8196067Z Traceback (most recent call last):
2025-12-04T09:59:13.8196557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8196669Z     getattr(self, test_name)()
2025-12-04T09:59:13.8197144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8197222Z     fn()
2025-12-04T09:59:13.8197677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8197768Z     method(*args, **kwargs)
2025-12-04T09:59:13.8198216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8198306Z     method(*args, **kwargs)
2025-12-04T09:59:13.8198755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8198847Z     with policy():
2025-12-04T09:59:13.8199309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8199404Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8200497Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8200504Z 
2025-12-04T09:59:13.8200693Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8201271Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8201278Z 
2025-12-04T09:59:13.8201516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8201671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8201837Z ====================== 1 failed, 26 deselected in 12.99s =======================
2025-12-04T09:59:13.8201920Z Got exit code 1
2025-12-04T09:59:13.8202011Z Retrying single test...
2025-12-04T09:59:13.8202565Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml
2025-12-04T09:59:13.8202707Z ============================= test session starts ==============================
2025-12-04T09:59:13.8203013Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8203115Z cachedir: .pytest_cache
2025-12-04T09:59:13.8203576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8203690Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8203781Z configfile: pytest.ini
2025-12-04T09:59:13.8204255Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8204508Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8205162Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8205276Z Running 1 items in this shard
2025-12-04T09:59:13.8205281Z 
2025-12-04T09:59:13.8206184Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:53:25.024000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74247
2025-12-04T09:59:13.8206625Z I1204 09:53:25.025000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74248
2025-12-04T09:59:13.8207102Z I1204 09:53:25.026000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74249
2025-12-04T09:59:13.8207540Z I1204 09:53:25.026000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74250
2025-12-04T09:59:13.8208657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8208768Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8209857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8209976Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8211074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8211190Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8212306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8212421Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8213276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8213376Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8215174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8215260Z   _warn_cpu_init()
2025-12-04T09:59:13.8216125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8216226Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8218509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8218640Z   _warn_cpu_init()
2025-12-04T09:59:13.8219634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8219744Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8220710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8221081Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8223099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8223204Z   _warn_cpu_init()
2025-12-04T09:59:13.8224194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8224292Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8225259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8225372Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8227451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8227551Z   _warn_cpu_init()
2025-12-04T09:59:13.8228542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8228645Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8229625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8229731Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8230497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8230609Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8231375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8231480Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8232242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8232351Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8233225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8233360Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8234034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8234134Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8234800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8234893Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8235602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8235694Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8236371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8236463Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8237344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8237443Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8241443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8241802Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8245786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8246134Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8250156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8250547Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8254525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T09:59:13.8254879Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T09:59:13.8255281Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8255782Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8256910Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8257431Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8258430Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8258827Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8259795Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8260281Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8261236Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8261733Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8262751Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8263207Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8264168Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8264664Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8266345Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040.
2025-12-04T09:59:13.8266721Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8267373Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8268478Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8268959Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8269721Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8270211Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8270637Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8271107Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8271998Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8272448Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8273329Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8273681Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8274533Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8274964Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8275812Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8276310Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8277167Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8277565Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8278421Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8278891Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8280360Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 609157120 and is now 678363136.
2025-12-04T09:59:13.8280690Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8281273Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8282252Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8282580Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8283237Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8283725Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8284122Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8284593Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8285485Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8285937Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8286825Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8287177Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8288033Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8288461Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8289367Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8289807Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8290657Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8291061Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8291945Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8292383Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8293845Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8294172Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8294758Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8295735Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8296085Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8296963Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8297518Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8297968Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8298494Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8299499Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8300007Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8301004Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8301396Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8302370Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8302931Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8303888Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8304372Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8305323Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8305860Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8306819Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8307317Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8309071Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 607059968 and is now 678363136.
2025-12-04T09:59:13.8309397Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8309989Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8310994Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8311320Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8311955Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8312445Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8312532Z dist init r=1, world=4
2025-12-04T09:59:13.8312620Z dist init r=3, world=4
2025-12-04T09:59:13.8312711Z dist init r=0, world=4
2025-12-04T09:59:13.8312795Z dist init r=2, world=4
2025-12-04T09:59:13.8313822Z [rank0]:[W1204 09:53:36.033367191 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8313917Z FAILED [13.2387s] [100%]
2025-12-04T09:59:13.8313922Z 
2025-12-04T09:59:13.8314051Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8314321Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______
2025-12-04T09:59:13.8314427Z Traceback (most recent call last):
2025-12-04T09:59:13.8314909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8315010Z     self._join_processes(fn)
2025-12-04T09:59:13.8315581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8315713Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8316251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8316351Z     raise RuntimeError(error)
2025-12-04T09:59:13.8316562Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8316670Z Traceback (most recent call last):
2025-12-04T09:59:13.8317151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8317281Z     getattr(self, test_name)()
2025-12-04T09:59:13.8317752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8317837Z     fn()
2025-12-04T09:59:13.8318289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8318382Z     method(*args, **kwargs)
2025-12-04T09:59:13.8318834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8318925Z     method(*args, **kwargs)
2025-12-04T09:59:13.8319373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8319463Z     with policy():
2025-12-04T09:59:13.8319913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8320017Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8321400Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8321411Z 
2025-12-04T09:59:13.8321630Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8322363Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8322370Z 
2025-12-04T09:59:13.8322634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8322640Z 
2025-12-04T09:59:13.8322645Z 
2025-12-04T09:59:13.8322868Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8323127Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8323934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml -
2025-12-04T09:59:13.8324105Z =========================== short test summary info ============================
2025-12-04T09:59:13.8324932Z FAILED [13.2387s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8325065Z Traceback (most recent call last):
2025-12-04T09:59:13.8325615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8325730Z     getattr(self, test_name)()
2025-12-04T09:59:13.8326267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8326359Z     fn()
2025-12-04T09:59:13.8326866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8327027Z     method(*args, **kwargs)
2025-12-04T09:59:13.8327569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8327680Z     method(*args, **kwargs)
2025-12-04T09:59:13.8328185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8328289Z     with policy():
2025-12-04T09:59:13.8328799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8328909Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8330112Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136.
2025-12-04T09:59:13.8330156Z 
2025-12-04T09:59:13.8330373Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8331045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8331050Z 
2025-12-04T09:59:13.8331314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8331496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8331678Z ====================== 1 failed, 26 deselected in 13.46s =======================
2025-12-04T09:59:13.8331775Z Got exit code 1
2025-12-04T09:59:13.8332355Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda
2025-12-04T09:59:13.8332760Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.8333380Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml
2025-12-04T09:59:13.8333659Z ============================= test session starts ==============================
2025-12-04T09:59:13.8333971Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8334092Z cachedir: .pytest_cache
2025-12-04T09:59:13.8334553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8334663Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8334763Z configfile: pytest.ini
2025-12-04T09:59:13.8335237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8335427Z collecting ... collected 60 items / 20 deselected / 40 selected
2025-12-04T09:59:13.8335554Z stepcurrent: skipping 20 already run items.
2025-12-04T09:59:13.8335652Z Running 7 items in this shard
2025-12-04T09:59:13.8335659Z 
2025-12-04T09:59:13.8336665Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:53:42.954000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74584
2025-12-04T09:59:13.8337333Z I1204 09:53:42.955000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74585
2025-12-04T09:59:13.8337830Z I1204 09:53:42.955000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74586
2025-12-04T09:59:13.8338319Z I1204 09:53:42.956000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74587
2025-12-04T09:59:13.8339576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8339774Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8341015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8341139Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8342364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8342514Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8343740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8343864Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8345894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8345989Z   _warn_cpu_init()
2025-12-04T09:59:13.8348001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8348098Z   _warn_cpu_init()
2025-12-04T09:59:13.8350102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8350195Z   _warn_cpu_init()
2025-12-04T09:59:13.8351965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8352060Z   _warn_cpu_init()
2025-12-04T09:59:13.8352945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8353050Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8353455Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8353928Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8354882Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8355334Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8356221Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8356571Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8357455Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8357893Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8358740Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8359172Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8360018Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8360418Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8361273Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8361735Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8363208Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.8363538Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8364119Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8365131Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8365456Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8366086Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8366575Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8366971Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8367715Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8368612Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8369059Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8369944Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8370323Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8371180Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8371619Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8372470Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8372911Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8373765Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8374176Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8375070Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8375507Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8377258Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8377636Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8378308Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8379438Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8379804Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8380520Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8381108Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8381584Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8382113Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8383120Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8383627Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8384647Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8385042Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8386010Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8386494Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8387452Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8387940Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8389006Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8389560Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8390417Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8390858Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8392338Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.8392660Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8393249Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8394245Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8394574Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8395233Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8395743Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8396142Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8396610Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8397503Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8397976Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8398865Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8399215Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8400066Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8400496Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8401343Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8401780Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8402648Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8403046Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8403899Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8404337Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8405815Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 586088448 and is now 651100160.
2025-12-04T09:59:13.8406136Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8406726Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8407725Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8408108Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8408741Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8409227Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8409313Z dist init r=2, world=4
2025-12-04T09:59:13.8409397Z dist init r=0, world=4
2025-12-04T09:59:13.8409487Z dist init r=1, world=4
2025-12-04T09:59:13.8409568Z dist init r=3, world=4
2025-12-04T09:59:13.8410619Z [rank0]:[W1204 09:53:54.743027594 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8410715Z FAILED [12.7101s] [ 14%]
2025-12-04T09:59:13.8410721Z 
2025-12-04T09:59:13.8410852Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8411132Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____
2025-12-04T09:59:13.8411237Z Traceback (most recent call last):
2025-12-04T09:59:13.8411722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8411822Z     self._join_processes(fn)
2025-12-04T09:59:13.8412336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8412467Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8413005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8413104Z     raise RuntimeError(error)
2025-12-04T09:59:13.8413315Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.8413422Z Traceback (most recent call last):
2025-12-04T09:59:13.8413929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8414034Z     getattr(self, test_name)()
2025-12-04T09:59:13.8414508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8414593Z     fn()
2025-12-04T09:59:13.8415038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8415132Z     method(*args, **kwargs)
2025-12-04T09:59:13.8415584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8415673Z     method(*args, **kwargs)
2025-12-04T09:59:13.8416117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8416207Z     with policy():
2025-12-04T09:59:13.8416907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8417025Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8418241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.8418249Z 
2025-12-04T09:59:13.8418464Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8419158Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8419201Z 
2025-12-04T09:59:13.8419493Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8419499Z 
2025-12-04T09:59:13.8419669Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8419790Z Traceback (most recent call last):
2025-12-04T09:59:13.8420339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8420453Z     getattr(self, test_name)()
2025-12-04T09:59:13.8421205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8421304Z     fn()
2025-12-04T09:59:13.8421892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8421996Z     method(*args, **kwargs)
2025-12-04T09:59:13.8422508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8422612Z     method(*args, **kwargs)
2025-12-04T09:59:13.8423120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8423224Z     with policy():
2025-12-04T09:59:13.8423735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8423852Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8425063Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8425071Z 
2025-12-04T09:59:13.8425290Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8425969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8425974Z 
2025-12-04T09:59:13.8426232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8426280Z 
2025-12-04T09:59:13.8426285Z 
2025-12-04T09:59:13.8426509Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8426766Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8427578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml -
2025-12-04T09:59:13.8427747Z =========================== short test summary info ============================
2025-12-04T09:59:13.8428603Z FAILED [12.7101s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.8428732Z Traceback (most recent call last):
2025-12-04T09:59:13.8429277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8429398Z     getattr(self, test_name)()
2025-12-04T09:59:13.8429934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8430019Z     fn()
2025-12-04T09:59:13.8430530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8430629Z     method(*args, **kwargs)
2025-12-04T09:59:13.8431130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8431239Z     method(*args, **kwargs)
2025-12-04T09:59:13.8431815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8431922Z     with policy():
2025-12-04T09:59:13.8432428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8432536Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8433808Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.8433815Z 
2025-12-04T09:59:13.8434014Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8434695Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8434703Z 
2025-12-04T09:59:13.8434947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8434952Z 
2025-12-04T09:59:13.8435100Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8435215Z Traceback (most recent call last):
2025-12-04T09:59:13.8435730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8435843Z     getattr(self, test_name)()
2025-12-04T09:59:13.8436343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8436424Z     fn()
2025-12-04T09:59:13.8436901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8436998Z     method(*args, **kwargs)
2025-12-04T09:59:13.8437472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8437572Z     method(*args, **kwargs)
2025-12-04T09:59:13.8438041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8438133Z     with policy():
2025-12-04T09:59:13.8438641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8438741Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8440059Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8440069Z 
2025-12-04T09:59:13.8440273Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8440946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8440953Z 
2025-12-04T09:59:13.8441205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8441375Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8441552Z ====================== 1 failed, 20 deselected in 12.93s =======================
2025-12-04T09:59:13.8441644Z Got exit code 1
2025-12-04T09:59:13.8441753Z Retrying single test...
2025-12-04T09:59:13.8442353Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml
2025-12-04T09:59:13.8442512Z ============================= test session starts ==============================
2025-12-04T09:59:13.8442852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8442954Z cachedir: .pytest_cache
2025-12-04T09:59:13.8443525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8443642Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8443741Z configfile: pytest.ini
2025-12-04T09:59:13.8444267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8444472Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8445207Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8445348Z Running 1 items in this shard
2025-12-04T09:59:13.8445353Z 
2025-12-04T09:59:13.8446360Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:54:00.474000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74921
2025-12-04T09:59:13.8446858Z I1204 09:54:00.475000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74922
2025-12-04T09:59:13.8447338Z I1204 09:54:00.476000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74923
2025-12-04T09:59:13.8447812Z I1204 09:54:00.476000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74924
2025-12-04T09:59:13.8449029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8449153Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8450362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8450485Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8451708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8451825Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8453015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8453144Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8455111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8455216Z   _warn_cpu_init()
2025-12-04T09:59:13.8457438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8457600Z   _warn_cpu_init()
2025-12-04T09:59:13.8459647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8459752Z   _warn_cpu_init()
2025-12-04T09:59:13.8461765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8461902Z   _warn_cpu_init()
2025-12-04T09:59:13.8462897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8463009Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8463480Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8464013Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8465033Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8465543Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8466563Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8466968Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8467934Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8468624Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8469555Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8470033Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8470954Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8471383Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8472323Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8472865Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8474488Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064.
2025-12-04T09:59:13.8474836Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8475482Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8476604Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8476952Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8477654Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8478179Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8478626Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8479137Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8480116Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8480638Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8481593Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8481981Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8483093Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8483537Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8484381Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8484817Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8485662Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8486055Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8486964Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8487399Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8488889Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8489238Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8489830Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8490836Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8491156Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8491793Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8492276Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8492685Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8493156Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8494072Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8494518Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8495398Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8495755Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8496836Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8497339Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8498292Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8498782Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8499746Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8500256Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8501224Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8501713Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8503380Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8503777Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8504438Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8505576Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8505935Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8506661Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8507202Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8507658Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8508209Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8509398Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8509849Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8510723Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8511082Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8511931Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8512368Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8513216Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8513650Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8514622Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8515023Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8515881Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8516318Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8517832Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:13.8518158Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8518747Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8519747Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8520065Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8520706Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8521517Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8521692Z dist init r=0, world=4
2025-12-04T09:59:13.8521792Z dist init r=1, world=4
2025-12-04T09:59:13.8521885Z dist init r=2, world=4
2025-12-04T09:59:13.8522087Z dist init r=3, world=4
2025-12-04T09:59:13.8523244Z [rank0]:[W1204 09:54:11.302714537 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8523357Z FAILED [12.8737s] [100%]
2025-12-04T09:59:13.8523363Z 
2025-12-04T09:59:13.8523513Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8523827Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____
2025-12-04T09:59:13.8523953Z Traceback (most recent call last):
2025-12-04T09:59:13.8524503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8524613Z     self._join_processes(fn)
2025-12-04T09:59:13.8525202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8525340Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8525953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8526068Z     raise RuntimeError(error)
2025-12-04T09:59:13.8526299Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.8526423Z Traceback (most recent call last):
2025-12-04T09:59:13.8527060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8527177Z     getattr(self, test_name)()
2025-12-04T09:59:13.8527710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8527795Z     fn()
2025-12-04T09:59:13.8528310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8528414Z     method(*args, **kwargs)
2025-12-04T09:59:13.8528912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8529057Z     method(*args, **kwargs)
2025-12-04T09:59:13.8529567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8529665Z     with policy():
2025-12-04T09:59:13.8530177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8530285Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8531508Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8531515Z 
2025-12-04T09:59:13.8531729Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8532423Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8532431Z 
2025-12-04T09:59:13.8532693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8532699Z 
2025-12-04T09:59:13.8532861Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8532989Z Traceback (most recent call last):
2025-12-04T09:59:13.8533740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8533875Z     getattr(self, test_name)()
2025-12-04T09:59:13.8534350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8534426Z     fn()
2025-12-04T09:59:13.8534880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8534971Z     method(*args, **kwargs)
2025-12-04T09:59:13.8535414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8535511Z     method(*args, **kwargs)
2025-12-04T09:59:13.8535953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8536045Z     with policy():
2025-12-04T09:59:13.8536560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8536660Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8538050Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8538057Z 
2025-12-04T09:59:13.8538268Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8538962Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8538968Z 
2025-12-04T09:59:13.8539268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8539302Z 
2025-12-04T09:59:13.8539307Z 
2025-12-04T09:59:13.8539534Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8539795Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8540599Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml -
2025-12-04T09:59:13.8540776Z =========================== short test summary info ============================
2025-12-04T09:59:13.8541627Z FAILED [12.8737s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.8541780Z Traceback (most recent call last):
2025-12-04T09:59:13.8542325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8542440Z     getattr(self, test_name)()
2025-12-04T09:59:13.8542976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8543063Z     fn()
2025-12-04T09:59:13.8543567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8543674Z     method(*args, **kwargs)
2025-12-04T09:59:13.8544174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8544283Z     method(*args, **kwargs)
2025-12-04T09:59:13.8544787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8544880Z     with policy():
2025-12-04T09:59:13.8545393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8545502Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8546750Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8546763Z 
2025-12-04T09:59:13.8546972Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8547653Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8547660Z 
2025-12-04T09:59:13.8547929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8547934Z 
2025-12-04T09:59:13.8548094Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8548221Z Traceback (most recent call last):
2025-12-04T09:59:13.8548879Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8548987Z     getattr(self, test_name)()
2025-12-04T09:59:13.8549591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8549671Z     fn()
2025-12-04T09:59:13.8550285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8550392Z     method(*args, **kwargs)
2025-12-04T09:59:13.8550863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8550966Z     method(*args, **kwargs)
2025-12-04T09:59:13.8551435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8551561Z     with policy():
2025-12-04T09:59:13.8552073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8552172Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8553318Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8553324Z 
2025-12-04T09:59:13.8553522Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8554160Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8554202Z 
2025-12-04T09:59:13.8554455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8554625Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8554799Z ====================== 1 failed, 26 deselected in 13.09s =======================
2025-12-04T09:59:13.8554888Z Got exit code 1
2025-12-04T09:59:13.8554988Z Retrying single test...
2025-12-04T09:59:13.8555581Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml
2025-12-04T09:59:13.8555735Z ============================= test session starts ==============================
2025-12-04T09:59:13.8556061Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8556165Z cachedir: .pytest_cache
2025-12-04T09:59:13.8556649Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8556766Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8556867Z configfile: pytest.ini
2025-12-04T09:59:13.8557373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8557582Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8558321Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8558432Z Running 1 items in this shard
2025-12-04T09:59:13.8558437Z 
2025-12-04T09:59:13.8559408Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:54:18.024000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75258
2025-12-04T09:59:13.8559876Z I1204 09:54:18.025000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75259
2025-12-04T09:59:13.8560348Z I1204 09:54:18.026000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75260
2025-12-04T09:59:13.8560805Z I1204 09:54:18.027000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75261
2025-12-04T09:59:13.8561992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8562113Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8563274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8563428Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8564677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8564792Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8565874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8565988Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8567814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8567903Z   _warn_cpu_init()
2025-12-04T09:59:13.8569685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8569774Z   _warn_cpu_init()
2025-12-04T09:59:13.8571605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8571692Z   _warn_cpu_init()
2025-12-04T09:59:13.8573486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8573572Z   _warn_cpu_init()
2025-12-04T09:59:13.8574458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8574557Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8574963Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8575444Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8576408Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8577068Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8578093Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8578522Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8579484Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8579969Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8580962Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8581453Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8582412Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8582852Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8583815Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8584303Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8585990Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:13.8586360Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8587017Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8588164Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8588526Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8589321Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8589810Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8590208Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8590684Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8591567Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8592081Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8592962Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8593317Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8594168Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8594623Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8595482Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8595915Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8596772Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8597163Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8598020Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8598455Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8599948Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.8600278Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8600861Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8601872Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8602193Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8602837Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8603318Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8603719Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8604192Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8605103Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8605584Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8606464Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8606821Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8607696Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8608134Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8608990Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8609418Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8610277Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8610676Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8611542Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8612006Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8613489Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:13.8613821Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8614404Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8615421Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8615744Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8616455Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8617146Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8617598Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8618203Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8619205Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8619723Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8620712Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8621356Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8622327Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8622813Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8623783Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8624270Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8625238Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8625690Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8626718Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8627207Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8628871Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:13.8629247Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8629903Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8631046Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8631408Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8632132Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8632676Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8632960Z dist init r=1, world=4
2025-12-04T09:59:13.8633056Z dist init r=0, world=4
2025-12-04T09:59:13.8633139Z dist init r=3, world=4
2025-12-04T09:59:13.8633221Z dist init r=2, world=4
2025-12-04T09:59:13.8634253Z [rank0]:[W1204 09:54:29.826022596 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8634345Z FAILED [12.4666s] [100%]
2025-12-04T09:59:13.8634350Z 
2025-12-04T09:59:13.8634483Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8634796Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____
2025-12-04T09:59:13.8634901Z Traceback (most recent call last):
2025-12-04T09:59:13.8635395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8635497Z     self._join_processes(fn)
2025-12-04T09:59:13.8636018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8636146Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8636685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8636787Z     raise RuntimeError(error)
2025-12-04T09:59:13.8636993Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8637096Z Traceback (most recent call last):
2025-12-04T09:59:13.8637585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8637682Z     getattr(self, test_name)()
2025-12-04T09:59:13.8638167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8638245Z     fn()
2025-12-04T09:59:13.8638694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8638830Z     method(*args, **kwargs)
2025-12-04T09:59:13.8639279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8639368Z     method(*args, **kwargs)
2025-12-04T09:59:13.8639820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8639907Z     with policy():
2025-12-04T09:59:13.8640365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8640459Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8641539Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.8641553Z 
2025-12-04T09:59:13.8641743Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8642346Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8642352Z 
2025-12-04T09:59:13.8642590Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8642596Z 
2025-12-04T09:59:13.8642601Z 
2025-12-04T09:59:13.8642796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8643033Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8643795Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml -
2025-12-04T09:59:13.8643948Z =========================== short test summary info ============================
2025-12-04T09:59:13.8644711Z FAILED [12.4666s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.8644816Z Traceback (most recent call last):
2025-12-04T09:59:13.8645309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8645434Z     getattr(self, test_name)()
2025-12-04T09:59:13.8645908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8645992Z     fn()
2025-12-04T09:59:13.8646442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8646532Z     method(*args, **kwargs)
2025-12-04T09:59:13.8646989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8647078Z     method(*args, **kwargs)
2025-12-04T09:59:13.8647532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8647618Z     with policy():
2025-12-04T09:59:13.8648065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8648166Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8649251Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:13.8649258Z 
2025-12-04T09:59:13.8649454Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8650083Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8650089Z 
2025-12-04T09:59:13.8650324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8650488Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8650645Z ====================== 1 failed, 26 deselected in 12.69s =======================
2025-12-04T09:59:13.8650739Z Got exit code 1
2025-12-04T09:59:13.8651269Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda
2025-12-04T09:59:13.8651628Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.8652188Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml
2025-12-04T09:59:13.8652331Z ============================= test session starts ==============================
2025-12-04T09:59:13.8652646Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8652740Z cachedir: .pytest_cache
2025-12-04T09:59:13.8653196Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8653313Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8653406Z configfile: pytest.ini
2025-12-04T09:59:13.8653879Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8654101Z collecting ... collected 60 items / 21 deselected / 39 selected
2025-12-04T09:59:13.8654250Z stepcurrent: skipping 21 already run items.
2025-12-04T09:59:13.8654355Z Running 6 items in this shard
2025-12-04T09:59:13.8654359Z 
2025-12-04T09:59:13.8655265Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:54:35.503000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75595
2025-12-04T09:59:13.8655704Z I1204 09:54:35.504000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75596
2025-12-04T09:59:13.8656145Z I1204 09:54:35.505000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75597
2025-12-04T09:59:13.8656851Z I1204 09:54:35.506000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75598
2025-12-04T09:59:13.8658117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8658246Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8659483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8659614Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8660850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8660983Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8662219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8662386Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8663355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8663468Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8664436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8664551Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8666578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8666677Z   _warn_cpu_init()
2025-12-04T09:59:13.8668696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8668825Z   _warn_cpu_init()
2025-12-04T09:59:13.8669846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8669954Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8670812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8670915Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8672692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8672813Z   _warn_cpu_init()
2025-12-04T09:59:13.8674587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8674674Z   _warn_cpu_init()
2025-12-04T09:59:13.8675560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8675652Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8676532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8676646Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8677521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8677615Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8678486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8678581Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8679467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8679564Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8680249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8680347Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8681035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8681131Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8681807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8681909Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8682640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8682746Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8683417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8683510Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8684187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8684307Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8684989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8685084Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8685755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8685857Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8686267Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8686746Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8687631Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8688083Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8688963Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8689340Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8690199Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8690631Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8691493Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8691925Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8692777Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8693182Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8694031Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8694476Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8695983Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216.
2025-12-04T09:59:13.8696396Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8697196Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8698341Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8698718Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8699434Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8699985Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8700435Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8700973Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8701975Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8702486Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8703502Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8703903Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8704871Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8705360Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8706329Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8706817Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8707779Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8708230Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8709546Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8710019Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8711475Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.8711809Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8712420Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8713398Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8713728Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8714361Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8714850Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8715252Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8715736Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8716619Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8717089Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8717970Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8718317Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8719180Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8719613Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8720470Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8721046Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8722162Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8722619Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8723788Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8724287Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8725942Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:59:13.8726350Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8727010Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8728112Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8728484Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8729197Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8729747Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8730197Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8730740Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8731774Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8732285Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8733391Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8733889Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8734750Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8735182Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8736037Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8736534Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8737680Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8738161Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8739121Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8739611Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8741246Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 602865664 and is now 653197312.
2025-12-04T09:59:13.8741646Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8742303Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8743399Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8743765Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8744480Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8745032Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8745130Z dist init r=1, world=4
2025-12-04T09:59:13.8745225Z dist init r=3, world=4
2025-12-04T09:59:13.8745322Z dist init r=2, world=4
2025-12-04T09:59:13.8745446Z dist init r=0, world=4
2025-12-04T09:59:13.8746609Z [rank0]:[W1204 09:54:47.043765415 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8746710Z FAILED [13.9911s] [ 16%]
2025-12-04T09:59:13.8746716Z 
2025-12-04T09:59:13.8746864Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8747170Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______
2025-12-04T09:59:13.8747289Z Traceback (most recent call last):
2025-12-04T09:59:13.8747838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8747952Z     self._join_processes(fn)
2025-12-04T09:59:13.8748534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8748678Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8749351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8749451Z     raise RuntimeError(error)
2025-12-04T09:59:13.8749669Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.8749776Z Traceback (most recent call last):
2025-12-04T09:59:13.8750263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8750389Z     getattr(self, test_name)()
2025-12-04T09:59:13.8750890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8750978Z     fn()
2025-12-04T09:59:13.8751432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8751525Z     method(*args, **kwargs)
2025-12-04T09:59:13.8751980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8752071Z     method(*args, **kwargs)
2025-12-04T09:59:13.8752532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8752644Z     with policy():
2025-12-04T09:59:13.8753094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8753195Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8754254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.8754260Z 
2025-12-04T09:59:13.8754458Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8755031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8755037Z 
2025-12-04T09:59:13.8755270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8755277Z 
2025-12-04T09:59:13.8755426Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8755532Z Traceback (most recent call last):
2025-12-04T09:59:13.8756023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8756121Z     getattr(self, test_name)()
2025-12-04T09:59:13.8756598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8756705Z     fn()
2025-12-04T09:59:13.8757153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8757242Z     method(*args, **kwargs)
2025-12-04T09:59:13.8757695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8757787Z     method(*args, **kwargs)
2025-12-04T09:59:13.8758235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8758320Z     with policy():
2025-12-04T09:59:13.8758772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8758875Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8759926Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:59:13.8759932Z 
2025-12-04T09:59:13.8760129Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8760703Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8760710Z 
2025-12-04T09:59:13.8760943Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8760948Z 
2025-12-04T09:59:13.8760961Z 
2025-12-04T09:59:13.8761153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8761435Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8762155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml -
2025-12-04T09:59:13.8762305Z =========================== short test summary info ============================
2025-12-04T09:59:13.8763031Z FAILED [13.9911s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.8763143Z Traceback (most recent call last):
2025-12-04T09:59:13.8763655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8763765Z     getattr(self, test_name)()
2025-12-04T09:59:13.8764240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8764321Z     fn()
2025-12-04T09:59:13.8764773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8764867Z     method(*args, **kwargs)
2025-12-04T09:59:13.8765322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8765411Z     method(*args, **kwargs)
2025-12-04T09:59:13.8765857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8765954Z     with policy():
2025-12-04T09:59:13.8766409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8766502Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8767563Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.8767568Z 
2025-12-04T09:59:13.8767800Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8768389Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8768393Z 
2025-12-04T09:59:13.8768628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8768633Z 
2025-12-04T09:59:13.8768783Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8768885Z Traceback (most recent call last):
2025-12-04T09:59:13.8769373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8769478Z     getattr(self, test_name)()
2025-12-04T09:59:13.8769951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8770028Z     fn()
2025-12-04T09:59:13.8770480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8770573Z     method(*args, **kwargs)
2025-12-04T09:59:13.8771030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8771120Z     method(*args, **kwargs)
2025-12-04T09:59:13.8771563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8771655Z     with policy():
2025-12-04T09:59:13.8772101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8772250Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8773315Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:59:13.8773319Z 
2025-12-04T09:59:13.8773508Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8774088Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8774092Z 
2025-12-04T09:59:13.8774347Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8774510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8774667Z ====================== 1 failed, 21 deselected in 14.21s =======================
2025-12-04T09:59:13.8774752Z Got exit code 1
2025-12-04T09:59:13.8774852Z Retrying single test...
2025-12-04T09:59:13.8775399Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml
2025-12-04T09:59:13.8775542Z ============================= test session starts ==============================
2025-12-04T09:59:13.8775860Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8775951Z cachedir: .pytest_cache
2025-12-04T09:59:13.8776510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8776623Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8776893Z configfile: pytest.ini
2025-12-04T09:59:13.8777438Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8777657Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8778387Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8778545Z Running 1 items in this shard
2025-12-04T09:59:13.8778551Z 
2025-12-04T09:59:13.8779568Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:54:53.963000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75932
2025-12-04T09:59:13.8780076Z I1204 09:54:53.964000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75933
2025-12-04T09:59:13.8780570Z I1204 09:54:53.965000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75934
2025-12-04T09:59:13.8781066Z I1204 09:54:53.966000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75935
2025-12-04T09:59:13.8782322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8782449Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8783692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8783817Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8785081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8785231Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8786460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8786578Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8787549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8787695Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8789843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8789940Z   _warn_cpu_init()
2025-12-04T09:59:13.8790819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8790908Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8791763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8791865Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8792721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8792822Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8793698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8793800Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8795600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8795698Z   _warn_cpu_init()
2025-12-04T09:59:13.8797474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8797567Z   _warn_cpu_init()
2025-12-04T09:59:13.8799368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8799485Z   _warn_cpu_init()
2025-12-04T09:59:13.8800372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8800471Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8801358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8801474Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8802358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8802453Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8803320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8803418Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8804103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8804205Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8804884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8804982Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8805664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8805761Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8806468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8806561Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8807235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8807336Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8808004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8808101Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8808773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8808864Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8809539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8809630Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8810040Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8810513Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8811403Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8811913Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8812791Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8813151Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8814009Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8814491Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8815346Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8815775Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8816871Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8817319Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8818293Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8818787Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8820473Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216.
2025-12-04T09:59:13.8821042Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8821709Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8822824Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8823190Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8823910Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8824452Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8824909Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8825438Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8826551Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8827063Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8828047Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8828448Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8829446Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8829935Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8830904Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8831388Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8832349Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8833004Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8833922Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8834601Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8836204Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312.
2025-12-04T09:59:13.8836556Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8837192Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8838271Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8838626Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8839327Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8839852Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8840282Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8840862Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8841843Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8842338Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8843289Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8843710Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8844637Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8845110Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8846042Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8846512Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8847541Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8847969Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8848911Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8849371Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8850916Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312.
2025-12-04T09:59:13.8851267Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8851881Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8852917Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8853257Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8853936Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8854475Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8855107Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8855627Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8856669Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8857360Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8858385Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8858795Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8859751Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8860237Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8861203Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8861690Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8862659Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8863129Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8864098Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8864582Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8866226Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.8866598Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8867251Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8868359Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8868717Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8869566Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8870136Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8870239Z dist init r=1, world=4
2025-12-04T09:59:13.8870340Z dist init r=0, world=4
2025-12-04T09:59:13.8870434Z dist init r=3, world=4
2025-12-04T09:59:13.8870524Z dist init r=2, world=4
2025-12-04T09:59:13.8871654Z [rank0]:[W1204 09:55:05.426532359 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8871781Z FAILED [13.7381s] [100%]
2025-12-04T09:59:13.8871787Z 
2025-12-04T09:59:13.8871933Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8872232Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______
2025-12-04T09:59:13.8872346Z Traceback (most recent call last):
2025-12-04T09:59:13.8872882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8872991Z     self._join_processes(fn)
2025-12-04T09:59:13.8873561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8873696Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8874281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8874400Z     raise RuntimeError(error)
2025-12-04T09:59:13.8874624Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.8874746Z Traceback (most recent call last):
2025-12-04T09:59:13.8875274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8875378Z     getattr(self, test_name)()
2025-12-04T09:59:13.8875929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8876017Z     fn()
2025-12-04T09:59:13.8876506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8876612Z     method(*args, **kwargs)
2025-12-04T09:59:13.8877096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8877204Z     method(*args, **kwargs)
2025-12-04T09:59:13.8877691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8877783Z     with policy():
2025-12-04T09:59:13.8878287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8878388Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8879543Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216.
2025-12-04T09:59:13.8879555Z 
2025-12-04T09:59:13.8879762Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8880391Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8880400Z 
2025-12-04T09:59:13.8880659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8880664Z 
2025-12-04T09:59:13.8880698Z 
2025-12-04T09:59:13.8880936Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.8881196Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.8881967Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml -
2025-12-04T09:59:13.8882132Z =========================== short test summary info ============================
2025-12-04T09:59:13.8882927Z FAILED [13.7381s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.8883075Z Traceback (most recent call last):
2025-12-04T09:59:13.8883609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8883714Z     getattr(self, test_name)()
2025-12-04T09:59:13.8884237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8884328Z     fn()
2025-12-04T09:59:13.8884814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8884914Z     method(*args, **kwargs)
2025-12-04T09:59:13.8885409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8885506Z     method(*args, **kwargs)
2025-12-04T09:59:13.8886001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8886093Z     with policy():
2025-12-04T09:59:13.8886583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8886695Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8887849Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216.
2025-12-04T09:59:13.8887883Z 
2025-12-04T09:59:13.8888099Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8888729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8888735Z 
2025-12-04T09:59:13.8888989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8889279Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.8889452Z ====================== 1 failed, 26 deselected in 13.96s =======================
2025-12-04T09:59:13.8889551Z Got exit code 1
2025-12-04T09:59:13.8889649Z Retrying single test...
2025-12-04T09:59:13.8890233Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml
2025-12-04T09:59:13.8890390Z ============================= test session starts ==============================
2025-12-04T09:59:13.8890716Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.8890817Z cachedir: .pytest_cache
2025-12-04T09:59:13.8891304Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.8891484Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.8896854Z configfile: pytest.ini
2025-12-04T09:59:13.8897463Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.8897685Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.8898562Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8898677Z Running 1 items in this shard
2025-12-04T09:59:13.8898685Z 
2025-12-04T09:59:13.8899707Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:55:12.404000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76269
2025-12-04T09:59:13.8900213Z I1204 09:55:12.405000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76270
2025-12-04T09:59:13.8900736Z I1204 09:55:12.406000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76271
2025-12-04T09:59:13.8901226Z I1204 09:55:12.406000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76272
2025-12-04T09:59:13.8902483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8902610Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8903853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8903979Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8905215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8905338Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8906600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.8906719Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.8907685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8907807Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8908878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8909107Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8909956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8910056Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8910902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8910998Z   return fsdp_fn(module, **kwargs)
2025-12-04T09:59:13.8912830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8912944Z   _warn_cpu_init()
2025-12-04T09:59:13.8914740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8914825Z   _warn_cpu_init()
2025-12-04T09:59:13.8916640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8916731Z   _warn_cpu_init()
2025-12-04T09:59:13.8918502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.8918598Z   _warn_cpu_init()
2025-12-04T09:59:13.8919475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8919576Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8920476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8920563Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8921824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8921924Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8922917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T09:59:13.8923013Z   fsdp_model = FSDP(
2025-12-04T09:59:13.8924013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.8924128Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8924901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8925013Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8925776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8925885Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8926651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8926944Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8927711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T09:59:13.8927825Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8928580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8928693Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8929446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8929599Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8930361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8930475Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8931235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T09:59:13.8931340Z   return func(*args, **kwargs)
2025-12-04T09:59:13.8931808Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8932337Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8933337Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8933938Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8934853Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8935215Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8936066Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8936581Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8937709Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8938200Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8939167Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8939610Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8940577Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8941141Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8942801Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216.
2025-12-04T09:59:13.8943166Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8943827Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8944966Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8945328Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8946052Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8946595Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.8947051Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8947579Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8948683Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8949319Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8950196Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8950550Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8951396Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8951838Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8952683Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8953117Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8953973Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8954366Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8955252Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8955713Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8957173Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312.
2025-12-04T09:59:13.8957522Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8958099Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8959082Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8959404Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8960043Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8960523Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.8960930Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8961400Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8962307Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8962764Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8963640Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8963995Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8964845Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8965282Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8966136Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8966564Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8967420Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8967883Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8968745Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8969175Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8970631Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 609157120 and is now 653197312.
2025-12-04T09:59:13.8971146Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8971777Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8972817Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8973159Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8973839Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8974348Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.8974772Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.8975308Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.8976248Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8976978Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.8977968Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8978373Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.8979332Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8979819Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8980787Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8981272Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.8982279Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8982755Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.8983722Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8984211Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.8985890Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.8986266Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8986928Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8988037Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8988398Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.8989324Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8989839Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.8989930Z dist init r=3, world=4
2025-12-04T09:59:13.8990056Z dist init r=2, world=4
2025-12-04T09:59:13.8990147Z dist init r=0, world=4
2025-12-04T09:59:13.8990242Z dist init r=1, world=4
2025-12-04T09:59:13.8991418Z [rank0]:[W1204 09:55:24.910910865 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.8991511Z FAILED [14.0231s] [100%]
2025-12-04T09:59:13.8991517Z 
2025-12-04T09:59:13.8991656Z =================================== FAILURES ===================================
2025-12-04T09:59:13.8991923Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______
2025-12-04T09:59:13.8992031Z Traceback (most recent call last):
2025-12-04T09:59:13.8992522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.8992617Z     self._join_processes(fn)
2025-12-04T09:59:13.8993141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.8993262Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.8993799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.8993905Z     raise RuntimeError(error)
2025-12-04T09:59:13.8994117Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.8994227Z Traceback (most recent call last):
2025-12-04T09:59:13.8994702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.8994858Z     getattr(self, test_name)()
2025-12-04T09:59:13.8995327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.8995413Z     fn()
2025-12-04T09:59:13.8995857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8995947Z     method(*args, **kwargs)
2025-12-04T09:59:13.8996394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.8996485Z     method(*args, **kwargs)
2025-12-04T09:59:13.8996966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.8997051Z     with policy():
2025-12-04T09:59:13.8997501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.8997605Z     raise RuntimeError(msg)
2025-12-04T09:59:13.8998665Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216.
2025-12-04T09:59:13.8998671Z 
2025-12-04T09:59:13.8998865Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.8999447Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.8999454Z 
2025-12-04T09:59:13.8999690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.8999695Z 
2025-12-04T09:59:13.8999842Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.8999948Z Traceback (most recent call last):
2025-12-04T09:59:13.9000435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9000532Z     getattr(self, test_name)()
2025-12-04T09:59:13.9001031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9001116Z     fn()
2025-12-04T09:59:13.9001561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9001652Z     method(*args, **kwargs)
2025-12-04T09:59:13.9002098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9002191Z     method(*args, **kwargs)
2025-12-04T09:59:13.9002643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9002729Z     with policy():
2025-12-04T09:59:13.9003179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9003280Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9004338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.9004343Z 
2025-12-04T09:59:13.9004536Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9005105Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.9005112Z 
2025-12-04T09:59:13.9005344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9005380Z 
2025-12-04T09:59:13.9005384Z 
2025-12-04T09:59:13.9005605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9005836Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9006553Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml -
2025-12-04T09:59:13.9006707Z =========================== short test summary info ============================
2025-12-04T09:59:13.9007447Z FAILED [14.0231s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9007579Z Traceback (most recent call last):
2025-12-04T09:59:13.9008060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9008164Z     getattr(self, test_name)()
2025-12-04T09:59:13.9008638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9008716Z     fn()
2025-12-04T09:59:13.9009173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9009263Z     method(*args, **kwargs)
2025-12-04T09:59:13.9009710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9009799Z     method(*args, **kwargs)
2025-12-04T09:59:13.9010248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9010339Z     with policy():
2025-12-04T09:59:13.9010787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9010883Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9011975Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216.
2025-12-04T09:59:13.9011981Z 
2025-12-04T09:59:13.9012169Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9012750Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.9012756Z 
2025-12-04T09:59:13.9012985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9012992Z 
2025-12-04T09:59:13.9013142Z Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.9013247Z Traceback (most recent call last):
2025-12-04T09:59:13.9013730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9013834Z     getattr(self, test_name)()
2025-12-04T09:59:13.9014311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9014387Z     fn()
2025-12-04T09:59:13.9014836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9014927Z     method(*args, **kwargs)
2025-12-04T09:59:13.9015377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9015467Z     method(*args, **kwargs)
2025-12-04T09:59:13.9015912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9016000Z     with policy():
2025-12-04T09:59:13.9016609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9016883Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9018101Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312.
2025-12-04T09:59:13.9018106Z 
2025-12-04T09:59:13.9018315Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9018966Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.9019005Z 
2025-12-04T09:59:13.9019270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9019454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9019641Z ====================== 1 failed, 26 deselected in 14.24s =======================
2025-12-04T09:59:13.9019734Z Got exit code 1
2025-12-04T09:59:13.9020313Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda
2025-12-04T09:59:13.9020715Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.9021547Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml
2025-12-04T09:59:13.9021714Z ============================= test session starts ==============================
2025-12-04T09:59:13.9022064Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9022178Z cachedir: .pytest_cache
2025-12-04T09:59:13.9022693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9022819Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9022929Z configfile: pytest.ini
2025-12-04T09:59:13.9023531Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9023755Z collecting ... collected 60 items / 22 deselected / 38 selected
2025-12-04T09:59:13.9023892Z stepcurrent: skipping 22 already run items.
2025-12-04T09:59:13.9023999Z Running 5 items in this shard
2025-12-04T09:59:13.9024004Z 
2025-12-04T09:59:13.9025014Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:55:30.994000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76606
2025-12-04T09:59:13.9025511Z I1204 09:55:30.994000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76607
2025-12-04T09:59:13.9026012Z I1204 09:55:30.995000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76608
2025-12-04T09:59:13.9026503Z I1204 09:55:30.996000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76609
2025-12-04T09:59:13.9027747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9027882Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9029112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9029241Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9030548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9030671Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9031902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9032023Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9034253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9034347Z   _warn_cpu_init()
2025-12-04T09:59:13.9036265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9036354Z   _warn_cpu_init()
2025-12-04T09:59:13.9038407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9038536Z   _warn_cpu_init()
2025-12-04T09:59:13.9040428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9040518Z   _warn_cpu_init()
2025-12-04T09:59:13.9041453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.9041562Z   return func(*args, **kwargs)
2025-12-04T09:59:13.9041992Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9042499Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9043440Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9043918Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9044878Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9045280Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9046185Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9046638Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9047571Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9048029Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9048932Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9049355Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9050255Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9050725Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9052581Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392.
2025-12-04T09:59:13.9052937Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9053635Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9054596Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9054919Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9055553Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9056041Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9056519Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9057206Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9058213Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9058786Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9059784Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9060175Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9061141Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9061654Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9062621Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9063106Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9064065Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9064515Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9065476Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9065975Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9067636Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.9068007Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9068660Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9069767Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9070099Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9070735Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9071223Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9071621Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9072105Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9073017Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9073491Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9074372Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9074723Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9075576Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9076034Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9076886Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9077315Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9078164Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9078564Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9079417Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9079856Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9081315Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.9081640Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9082235Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9083189Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9083513Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9084144Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9084623Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9085024Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9085517Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9086436Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9086882Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9087760Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9088137Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9088985Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9089422Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9090269Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9090705Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9091551Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9091948Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9092838Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9093272Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9094709Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9095033Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9095626Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9096651Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9097190Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9097905Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9098452Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9098596Z dist init r=3, world=4
2025-12-04T09:59:13.9098692Z dist init r=1, world=4
2025-12-04T09:59:13.9098815Z dist init r=0, world=4
2025-12-04T09:59:13.9098917Z dist init r=2, world=4
2025-12-04T09:59:13.9100069Z [rank0]:[W1204 09:55:44.851114529 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.9100174Z FAILED [14.3450s] [ 20%]
2025-12-04T09:59:13.9100180Z 
2025-12-04T09:59:13.9100325Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9100619Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________
2025-12-04T09:59:13.9100770Z Traceback (most recent call last):
2025-12-04T09:59:13.9101321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9101435Z     self._join_processes(fn)
2025-12-04T09:59:13.9102019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9102161Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9102772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9102881Z     raise RuntimeError(error)
2025-12-04T09:59:13.9103114Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.9103235Z Traceback (most recent call last):
2025-12-04T09:59:13.9103773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9103887Z     getattr(self, test_name)()
2025-12-04T09:59:13.9104416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9104502Z     fn()
2025-12-04T09:59:13.9105013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9105112Z     method(*args, **kwargs)
2025-12-04T09:59:13.9105639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9105743Z     method(*args, **kwargs)
2025-12-04T09:59:13.9106247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9106344Z     with policy():
2025-12-04T09:59:13.9106851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9106958Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9108135Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9108144Z 
2025-12-04T09:59:13.9108355Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9109190Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9109197Z 
2025-12-04T09:59:13.9109432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9109437Z 
2025-12-04T09:59:13.9109441Z 
2025-12-04T09:59:13.9109639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9109868Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9110567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml -
2025-12-04T09:59:13.9110774Z =========================== short test summary info ============================
2025-12-04T09:59:13.9111486Z FAILED [14.3450s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.9111596Z Traceback (most recent call last):
2025-12-04T09:59:13.9112081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9112178Z     getattr(self, test_name)()
2025-12-04T09:59:13.9112656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9112770Z     fn()
2025-12-04T09:59:13.9113214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9113312Z     method(*args, **kwargs)
2025-12-04T09:59:13.9113759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9113855Z     method(*args, **kwargs)
2025-12-04T09:59:13.9114298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9114379Z     with policy():
2025-12-04T09:59:13.9114830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9114922Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9115955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9115970Z 
2025-12-04T09:59:13.9116155Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9116714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9116719Z 
2025-12-04T09:59:13.9116980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9117136Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9117297Z ====================== 1 failed, 22 deselected in 14.56s =======================
2025-12-04T09:59:13.9117377Z Got exit code 1
2025-12-04T09:59:13.9117466Z Retrying single test...
2025-12-04T09:59:13.9118019Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml
2025-12-04T09:59:13.9118160Z ============================= test session starts ==============================
2025-12-04T09:59:13.9118470Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9118567Z cachedir: .pytest_cache
2025-12-04T09:59:13.9119018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9119132Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9119222Z configfile: pytest.ini
2025-12-04T09:59:13.9119695Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9119889Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9120513Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9120613Z Running 1 items in this shard
2025-12-04T09:59:13.9120622Z 
2025-12-04T09:59:13.9121976Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:55:50.154000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76943
2025-12-04T09:59:13.9122521Z I1204 09:55:50.154000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76944
2025-12-04T09:59:13.9123019Z I1204 09:55:50.155000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76945
2025-12-04T09:59:13.9123505Z I1204 09:55:50.156000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76946
2025-12-04T09:59:13.9124754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9124916Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9126151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9126282Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9127511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9127639Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9128858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9128989Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9131050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9131146Z   _warn_cpu_init()
2025-12-04T09:59:13.9133170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9133269Z   _warn_cpu_init()
2025-12-04T09:59:13.9135206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9135293Z   _warn_cpu_init()
2025-12-04T09:59:13.9137472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9137598Z   _warn_cpu_init()
2025-12-04T09:59:13.9138602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.9138710Z   return func(*args, **kwargs)
2025-12-04T09:59:13.9139165Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9139729Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9140730Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9141242Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9142225Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9142625Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9143585Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9144073Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9145063Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9145549Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9146508Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9146955Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9147928Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9148424Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9150033Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9150363Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9150944Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9151979Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9152302Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9152938Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9153420Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9153839Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9154311Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9155197Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9155649Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9156519Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9156873Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9157725Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9158155Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9159035Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9159464Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9160312Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9160704Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9161556Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9161996Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9163431Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 628031488.
2025-12-04T09:59:13.9163758Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9164402Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9165369Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9165690Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9166327Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9166834Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9167235Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9167710Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9168592Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9169044Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9169916Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9170265Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9171129Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9171585Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9172443Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9172870Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9173726Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9174120Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9174969Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9175407Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9177119Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9177565Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9178224Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9179308Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9179671Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9180425Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9180969Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9181419Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9181952Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9182946Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9183456Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9184442Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9184835Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9185827Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9186312Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9187272Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9187757Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9188725Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9189252Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9190101Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9190540Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9192009Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.9192550Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9193168Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9194190Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9194558Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9195236Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9195754Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9195848Z dist init r=0, world=4
2025-12-04T09:59:13.9195944Z dist init r=1, world=4
2025-12-04T09:59:13.9196031Z dist init r=2, world=4
2025-12-04T09:59:13.9196118Z dist init r=3, world=4
2025-12-04T09:59:13.9197214Z [rank0]:[W1204 09:56:02.216891347 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.9197308Z FAILED [14.4971s] [100%]
2025-12-04T09:59:13.9197313Z 
2025-12-04T09:59:13.9197454Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9197734Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________
2025-12-04T09:59:13.9197844Z Traceback (most recent call last):
2025-12-04T09:59:13.9198393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9198495Z     self._join_processes(fn)
2025-12-04T09:59:13.9199040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9199177Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9199739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9199849Z     raise RuntimeError(error)
2025-12-04T09:59:13.9200065Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9200179Z Traceback (most recent call last):
2025-12-04T09:59:13.9200878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9200988Z     getattr(self, test_name)()
2025-12-04T09:59:13.9201508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9201594Z     fn()
2025-12-04T09:59:13.9202082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9202186Z     method(*args, **kwargs)
2025-12-04T09:59:13.9202675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9202773Z     method(*args, **kwargs)
2025-12-04T09:59:13.9203264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9203385Z     with policy():
2025-12-04T09:59:13.9203907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9204008Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9205142Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9205149Z 
2025-12-04T09:59:13.9205358Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9205971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9206004Z 
2025-12-04T09:59:13.9206263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9206268Z 
2025-12-04T09:59:13.9206424Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9206538Z Traceback (most recent call last):
2025-12-04T09:59:13.9207073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9207178Z     getattr(self, test_name)()
2025-12-04T09:59:13.9207699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9207781Z     fn()
2025-12-04T09:59:13.9208269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9208369Z     method(*args, **kwargs)
2025-12-04T09:59:13.9208855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9208951Z     method(*args, **kwargs)
2025-12-04T09:59:13.9209442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9209536Z     with policy():
2025-12-04T09:59:13.9210035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9210163Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9211298Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.9211304Z 
2025-12-04T09:59:13.9211515Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9212122Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9212127Z 
2025-12-04T09:59:13.9212389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9212396Z 
2025-12-04T09:59:13.9212400Z 
2025-12-04T09:59:13.9212608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9212860Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9213634Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml -
2025-12-04T09:59:13.9213795Z =========================== short test summary info ============================
2025-12-04T09:59:13.9214577Z FAILED [14.4971s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9214693Z Traceback (most recent call last):
2025-12-04T09:59:13.9215223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9215401Z     getattr(self, test_name)()
2025-12-04T09:59:13.9215919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9216008Z     fn()
2025-12-04T09:59:13.9216589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9216861Z     method(*args, **kwargs)
2025-12-04T09:59:13.9217375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9217525Z     method(*args, **kwargs)
2025-12-04T09:59:13.9218067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9218159Z     with policy():
2025-12-04T09:59:13.9218665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9218782Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9219954Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9219960Z 
2025-12-04T09:59:13.9220179Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9221014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9221027Z 
2025-12-04T09:59:13.9221299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9221304Z 
2025-12-04T09:59:13.9221473Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9221594Z Traceback (most recent call last):
2025-12-04T09:59:13.9222149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9222258Z     getattr(self, test_name)()
2025-12-04T09:59:13.9222854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9222947Z     fn()
2025-12-04T09:59:13.9223458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9223560Z     method(*args, **kwargs)
2025-12-04T09:59:13.9224075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9224179Z     method(*args, **kwargs)
2025-12-04T09:59:13.9224687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9224784Z     with policy():
2025-12-04T09:59:13.9225293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9225406Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9226575Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488.
2025-12-04T09:59:13.9226581Z 
2025-12-04T09:59:13.9226802Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9227432Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9227440Z 
2025-12-04T09:59:13.9227699Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9227920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9228130Z ====================== 1 failed, 26 deselected in 14.72s =======================
2025-12-04T09:59:13.9228230Z Got exit code 1
2025-12-04T09:59:13.9228332Z Retrying single test...
2025-12-04T09:59:13.9228957Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml
2025-12-04T09:59:13.9229122Z ============================= test session starts ==============================
2025-12-04T09:59:13.9229468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9229574Z cachedir: .pytest_cache
2025-12-04T09:59:13.9230133Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9230252Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9230360Z configfile: pytest.ini
2025-12-04T09:59:13.9230902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9231115Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9231838Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9231948Z Running 1 items in this shard
2025-12-04T09:59:13.9231953Z 
2025-12-04T09:59:13.9233163Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:56:09.104000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77280
2025-12-04T09:59:13.9233634Z I1204 09:56:09.105000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77281
2025-12-04T09:59:13.9234096Z I1204 09:56:09.105000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77282
2025-12-04T09:59:13.9234566Z I1204 09:56:09.106000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77283
2025-12-04T09:59:13.9235774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9235901Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9237072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9237188Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9238349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9238466Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9239629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9239742Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9241676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9241794Z   _warn_cpu_init()
2025-12-04T09:59:13.9243889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9244013Z   _warn_cpu_init()
2025-12-04T09:59:13.9245981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9246082Z   _warn_cpu_init()
2025-12-04T09:59:13.9248018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T09:59:13.9248115Z   _warn_cpu_init()
2025-12-04T09:59:13.9249074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:13.9249187Z   return func(*args, **kwargs)
2025-12-04T09:59:13.9249630Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9250175Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9251144Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9251636Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9252599Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9252982Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9253915Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9254488Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9255392Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9255850Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9257076Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9257532Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9258493Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9258986Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9260643Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9261010Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9261672Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9262745Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9263112Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9263833Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9264421Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9264870Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9265397Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9266400Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9266904Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9267898Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9268292Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9269319Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9269747Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9270623Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9271084Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9271932Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9272330Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9273181Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9273648Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9275093Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 611254272 and is now 628031488.
2025-12-04T09:59:13.9275414Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9276005Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9276959Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9277287Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9277944Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9278428Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9278824Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9279287Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9280184Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9280633Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9281510Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9281858Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9282708Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9283150Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9284047Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9284487Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9285341Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9285738Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9286625Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9287064Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9288505Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9288830Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9289414Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9290369Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9290724Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9291359Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9291842Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9292241Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9292709Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9293605Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9294053Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9294934Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9295281Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9296132Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9296862Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9297839Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9298329Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9299290Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9299766Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9300731Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9301222Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9302844Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488.
2025-12-04T09:59:13.9303204Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9303867Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9304973Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9305341Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9306050Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9306593Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9306697Z dist init r=3, world=4
2025-12-04T09:59:13.9306796Z dist init r=0, world=4
2025-12-04T09:59:13.9306896Z dist init r=1, world=4
2025-12-04T09:59:13.9306990Z dist init r=2, world=4
2025-12-04T09:59:13.9308145Z [rank0]:[W1204 09:56:22.856180214 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:13.9308252Z FAILED [14.4992s] [100%]
2025-12-04T09:59:13.9308257Z 
2025-12-04T09:59:13.9308402Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9308816Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________
2025-12-04T09:59:13.9308932Z Traceback (most recent call last):
2025-12-04T09:59:13.9309538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9309640Z     self._join_processes(fn)
2025-12-04T09:59:13.9310231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9310355Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9310897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9310993Z     raise RuntimeError(error)
2025-12-04T09:59:13.9311203Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9311305Z Traceback (most recent call last):
2025-12-04T09:59:13.9311783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9311928Z     getattr(self, test_name)()
2025-12-04T09:59:13.9312399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9312477Z     fn()
2025-12-04T09:59:13.9312926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9313016Z     method(*args, **kwargs)
2025-12-04T09:59:13.9313474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9313565Z     method(*args, **kwargs)
2025-12-04T09:59:13.9314008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9314096Z     with policy():
2025-12-04T09:59:13.9314545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9314640Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9315683Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9315691Z 
2025-12-04T09:59:13.9315879Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9316466Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9316471Z 
2025-12-04T09:59:13.9316704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9316708Z 
2025-12-04T09:59:13.9316713Z 
2025-12-04T09:59:13.9316908Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9317136Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9317839Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml -
2025-12-04T09:59:13.9317996Z =========================== short test summary info ============================
2025-12-04T09:59:13.9318708Z FAILED [14.4992s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9318825Z Traceback (most recent call last):
2025-12-04T09:59:13.9319309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9319406Z     getattr(self, test_name)()
2025-12-04T09:59:13.9319888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9319967Z     fn()
2025-12-04T09:59:13.9320420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9320511Z     method(*args, **kwargs)
2025-12-04T09:59:13.9321331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9321451Z     method(*args, **kwargs)
2025-12-04T09:59:13.9321966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9322061Z     with policy():
2025-12-04T09:59:13.9322579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9322687Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9323865Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392.
2025-12-04T09:59:13.9323925Z 
2025-12-04T09:59:13.9324137Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9324765Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9324776Z 
2025-12-04T09:59:13.9325043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9325220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9325407Z ====================== 1 failed, 26 deselected in 14.71s =======================
2025-12-04T09:59:13.9325500Z Got exit code 1
2025-12-04T09:59:13.9326049Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda
2025-12-04T09:59:13.9326465Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.9327089Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml
2025-12-04T09:59:13.9327261Z ============================= test session starts ==============================
2025-12-04T09:59:13.9327605Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9327708Z cachedir: .pytest_cache
2025-12-04T09:59:13.9328264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9328385Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9328486Z configfile: pytest.ini
2025-12-04T09:59:13.9329032Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9329246Z collecting ... collected 60 items / 23 deselected / 37 selected
2025-12-04T09:59:13.9329393Z stepcurrent: skipping 23 already run items.
2025-12-04T09:59:13.9329503Z Running 4 items in this shard
2025-12-04T09:59:13.9329511Z 
2025-12-04T09:59:13.9330526Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:28.384000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77617
2025-12-04T09:59:13.9331037Z I1204 09:56:28.385000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77618
2025-12-04T09:59:13.9331526Z I1204 09:56:28.386000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77619
2025-12-04T09:59:13.9332017Z I1204 09:56:28.386000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77620
2025-12-04T09:59:13.9333254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9333426Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9335195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9335355Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9336605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9336940Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9338663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9338828Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9340064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9340185Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9341898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9342072Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9343391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9343519Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9345226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9345398Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9345855Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9346391Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9347396Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9347903Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9349145Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9349550Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9350462Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9350917Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9351819Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9352306Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9353208Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9353635Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9354537Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9355095Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9356558Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000.
2025-12-04T09:59:13.9356905Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9357491Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9358457Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9358786Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9359423Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9359913Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9360310Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9360780Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9361670Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9362117Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9363065Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9363415Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9364271Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9364701Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9365575Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9366009Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9366856Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9367255Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9368109Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9368548Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9370010Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 611254272 and is now 636420096.
2025-12-04T09:59:13.9370335Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9370924Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9371899Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9372225Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9372857Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9373347Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9373744Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9374213Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9375126Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9375597Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9376569Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9377125Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9378104Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9378627Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9379584Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9380079Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9381030Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9381478Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9382444Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9382942Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9384596Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9384959Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9385624Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9386725Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9387091Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9387803Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9388346Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9388906Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9389584Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9390589Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9391065Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9391993Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9392694Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9393609Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9394067Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9394965Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9395426Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9396325Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9396753Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9397657Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9398156Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9399679Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 604962816 and is now 636420096.
2025-12-04T09:59:13.9400020Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9400646Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9401746Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9402077Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9402708Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9403191Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9403281Z dist init r=1, world=4
2025-12-04T09:59:13.9403396Z dist init r=0, world=4
2025-12-04T09:59:13.9403511Z dist init r=3, world=4
2025-12-04T09:59:13.9403593Z dist init r=2, world=4
2025-12-04T09:59:13.9403675Z FAILED [8.7935s] [ 25%]
2025-12-04T09:59:13.9403680Z 
2025-12-04T09:59:13.9403817Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9404075Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______
2025-12-04T09:59:13.9404181Z Traceback (most recent call last):
2025-12-04T09:59:13.9404673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9404772Z     self._join_processes(fn)
2025-12-04T09:59:13.9405341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9405467Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9406005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9406112Z     raise RuntimeError(error)
2025-12-04T09:59:13.9406316Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9406429Z Traceback (most recent call last):
2025-12-04T09:59:13.9406908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9407007Z     getattr(self, test_name)()
2025-12-04T09:59:13.9407482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9407560Z     fn()
2025-12-04T09:59:13.9408010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9408107Z     method(*args, **kwargs)
2025-12-04T09:59:13.9408552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9408648Z     method(*args, **kwargs)
2025-12-04T09:59:13.9409097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9409204Z     with policy():
2025-12-04T09:59:13.9409662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9409755Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9410798Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000.
2025-12-04T09:59:13.9410811Z 
2025-12-04T09:59:13.9411002Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9411572Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9411577Z 
2025-12-04T09:59:13.9411814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9411821Z 
2025-12-04T09:59:13.9411825Z 
2025-12-04T09:59:13.9412017Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9412251Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9412958Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml -
2025-12-04T09:59:13.9413107Z =========================== short test summary info ============================
2025-12-04T09:59:13.9413828Z FAILED [8.7935s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9413996Z Traceback (most recent call last):
2025-12-04T09:59:13.9414496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9414594Z     getattr(self, test_name)()
2025-12-04T09:59:13.9415069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9415149Z     fn()
2025-12-04T09:59:13.9415597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9415693Z     method(*args, **kwargs)
2025-12-04T09:59:13.9416167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9416256Z     method(*args, **kwargs)
2025-12-04T09:59:13.9416967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9417068Z     with policy():
2025-12-04T09:59:13.9417576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9417693Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9418869Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000.
2025-12-04T09:59:13.9418875Z 
2025-12-04T09:59:13.9419098Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9419751Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9419757Z 
2025-12-04T09:59:13.9420023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9420205Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9420376Z ======================= 1 failed, 23 deselected in 9.01s =======================
2025-12-04T09:59:13.9420516Z Got exit code 1
2025-12-04T09:59:13.9420618Z Retrying single test...
2025-12-04T09:59:13.9421456Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml
2025-12-04T09:59:13.9421629Z ============================= test session starts ==============================
2025-12-04T09:59:13.9421976Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9422093Z cachedir: .pytest_cache
2025-12-04T09:59:13.9422610Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9422738Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9422850Z configfile: pytest.ini
2025-12-04T09:59:13.9423381Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9423598Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9424330Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9424440Z Running 1 items in this shard
2025-12-04T09:59:13.9424446Z 
2025-12-04T09:59:13.9425466Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:41.904000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77946
2025-12-04T09:59:13.9425967Z I1204 09:56:41.905000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77947
2025-12-04T09:59:13.9426567Z I1204 09:56:41.906000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77948
2025-12-04T09:59:13.9427071Z I1204 09:56:41.906000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77949
2025-12-04T09:59:13.9428327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9428459Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9430224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9430398Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9431629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9431755Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9433512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9433667Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9434799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9434908Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9436437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9436581Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9437675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9437785Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9439299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9439451Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9439858Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9440373Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9441297Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9441746Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9442623Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9443001Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9443860Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9444291Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9445152Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9445578Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9446423Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9446821Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9447693Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9448130Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9449582Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 720306176 and is now 745472000.
2025-12-04T09:59:13.9449914Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9450498Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9451473Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9451793Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9452424Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9452909Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9453364Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9453836Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9454720Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9455165Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9456077Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9456502Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9457625Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9458113Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9459081Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9459566Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9460523Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9460971Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9461987Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9462483Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9464116Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 609157120 and is now 636420096.
2025-12-04T09:59:13.9464488Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9465148Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9466245Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9466600Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9467314Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9467916Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9468372Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9469015Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9469905Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9470381Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9471263Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9471611Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9472467Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9472900Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9473756Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9474184Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9475056Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9475457Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9476306Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9476747Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9478194Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096.
2025-12-04T09:59:13.9478526Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9479111Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9480086Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9480632Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9481361Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9481880Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9482304Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9482811Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9483753Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9484258Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9485201Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9485573Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9486479Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9487122Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9488055Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9488527Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9489500Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9489933Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9490875Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9491356Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9493024Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9493370Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9493990Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9495058Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9495432Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9496101Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9496860Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9496970Z dist init r=1, world=4
2025-12-04T09:59:13.9497070Z dist init r=3, world=4
2025-12-04T09:59:13.9497176Z dist init r=2, world=4
2025-12-04T09:59:13.9497308Z dist init r=0, world=4
2025-12-04T09:59:13.9497407Z FAILED [8.6697s] [100%]
2025-12-04T09:59:13.9497413Z 
2025-12-04T09:59:13.9497558Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9497858Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______
2025-12-04T09:59:13.9497987Z Traceback (most recent call last):
2025-12-04T09:59:13.9498534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9498642Z     self._join_processes(fn)
2025-12-04T09:59:13.9499229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9499365Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9499974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9500086Z     raise RuntimeError(error)
2025-12-04T09:59:13.9500320Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9500442Z Traceback (most recent call last):
2025-12-04T09:59:13.9500983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9501090Z     getattr(self, test_name)()
2025-12-04T09:59:13.9501655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9501740Z     fn()
2025-12-04T09:59:13.9502253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9502354Z     method(*args, **kwargs)
2025-12-04T09:59:13.9502855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9502962Z     method(*args, **kwargs)
2025-12-04T09:59:13.9503460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9503561Z     with policy():
2025-12-04T09:59:13.9504071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9504176Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9505363Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096.
2025-12-04T09:59:13.9505369Z 
2025-12-04T09:59:13.9505583Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9506237Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9506245Z 
2025-12-04T09:59:13.9506509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9506516Z 
2025-12-04T09:59:13.9506549Z 
2025-12-04T09:59:13.9506793Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9507060Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9507872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml -
2025-12-04T09:59:13.9508044Z =========================== short test summary info ============================
2025-12-04T09:59:13.9508935Z FAILED [8.6697s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9509088Z Traceback (most recent call last):
2025-12-04T09:59:13.9509579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9509677Z     getattr(self, test_name)()
2025-12-04T09:59:13.9510500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9510577Z     fn()
2025-12-04T09:59:13.9511023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9511122Z     method(*args, **kwargs)
2025-12-04T09:59:13.9511566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9511662Z     method(*args, **kwargs)
2025-12-04T09:59:13.9512105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9512192Z     with policy():
2025-12-04T09:59:13.9512656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9512750Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9513792Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096.
2025-12-04T09:59:13.9513996Z 
2025-12-04T09:59:13.9514205Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9514810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9514815Z 
2025-12-04T09:59:13.9515067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9515235Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9515396Z ======================= 1 failed, 26 deselected in 8.89s =======================
2025-12-04T09:59:13.9515491Z Got exit code 1
2025-12-04T09:59:13.9515589Z Retrying single test...
2025-12-04T09:59:13.9516182Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml
2025-12-04T09:59:13.9516335Z ============================= test session starts ==============================
2025-12-04T09:59:13.9516660Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9516761Z cachedir: .pytest_cache
2025-12-04T09:59:13.9517246Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9517364Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9517461Z configfile: pytest.ini
2025-12-04T09:59:13.9517964Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9518172Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9518910Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9519013Z Running 1 items in this shard
2025-12-04T09:59:13.9519018Z 
2025-12-04T09:59:13.9519967Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:55.434000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78275
2025-12-04T09:59:13.9520436Z I1204 09:56:55.435000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78276
2025-12-04T09:59:13.9521053Z I1204 09:56:55.435000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78277
2025-12-04T09:59:13.9521808Z I1204 09:56:55.436000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78278
2025-12-04T09:59:13.9523079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9523207Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9524931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9525106Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9526343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9526479Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9527781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9527911Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9529140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9529264Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9530992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9531160Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9532871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9533033Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9534909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9535102Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9535549Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9536076Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9537336Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9537859Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9538847Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9539257Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9540224Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9540710Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9541685Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9542204Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9543176Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9543618Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9544584Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9545071Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9546703Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9547072Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9547729Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9549148Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9549514Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9550195Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9550703Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9551123Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9551726Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9552757Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9553218Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9554092Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9554447Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9555296Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9555729Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9556617Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9557048Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9557899Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9558295Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9559158Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9559594Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9561036Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 720306176 and is now 745472000.
2025-12-04T09:59:13.9561365Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9561973Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9562991Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9563308Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9563951Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9564432Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9564854Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9565332Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9566217Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9566671Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9567544Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9567901Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9568758Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9569210Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9570073Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9570504Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9571364Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9571762Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9572622Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9573052Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9574502Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9574896Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9575480Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9576526Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9577046Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9577770Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9578348Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9578801Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9579339Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9580334Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9580842Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9581832Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9582238Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9583225Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9583707Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9584670Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9585156Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9586120Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9586563Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9587525Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9588010Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9589697Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 604962816 and is now 636420096.
2025-12-04T09:59:13.9590055Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9590638Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9591611Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9591957Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9592604Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9593088Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9593176Z dist init r=3, world=4
2025-12-04T09:59:13.9593274Z dist init r=2, world=4
2025-12-04T09:59:13.9593360Z dist init r=1, world=4
2025-12-04T09:59:13.9593445Z dist init r=0, world=4
2025-12-04T09:59:13.9593539Z FAILED [8.8629s] [100%]
2025-12-04T09:59:13.9593544Z 
2025-12-04T09:59:13.9593673Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9593942Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______
2025-12-04T09:59:13.9594053Z Traceback (most recent call last):
2025-12-04T09:59:13.9594535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9594644Z     self._join_processes(fn)
2025-12-04T09:59:13.9595161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9595282Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9595849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9595947Z     raise RuntimeError(error)
2025-12-04T09:59:13.9596161Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.9596265Z Traceback (most recent call last):
2025-12-04T09:59:13.9596744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9596851Z     getattr(self, test_name)()
2025-12-04T09:59:13.9597318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9597398Z     fn()
2025-12-04T09:59:13.9597853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9597944Z     method(*args, **kwargs)
2025-12-04T09:59:13.9598392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9598482Z     method(*args, **kwargs)
2025-12-04T09:59:13.9598927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9599018Z     with policy():
2025-12-04T09:59:13.9599467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9599570Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9600632Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9600663Z 
2025-12-04T09:59:13.9600855Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9601430Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9601435Z 
2025-12-04T09:59:13.9601666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9601670Z 
2025-12-04T09:59:13.9601674Z 
2025-12-04T09:59:13.9601875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9602129Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9602848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml -
2025-12-04T09:59:13.9603006Z =========================== short test summary info ============================
2025-12-04T09:59:13.9603722Z FAILED [8.8629s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.9603832Z Traceback (most recent call last):
2025-12-04T09:59:13.9604314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9604411Z     getattr(self, test_name)()
2025-12-04T09:59:13.9604891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9604971Z     fn()
2025-12-04T09:59:13.9605423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9605516Z     method(*args, **kwargs)
2025-12-04T09:59:13.9605967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9606063Z     method(*args, **kwargs)
2025-12-04T09:59:13.9606531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9606616Z     with policy():
2025-12-04T09:59:13.9607069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9607163Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9608210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096.
2025-12-04T09:59:13.9608219Z 
2025-12-04T09:59:13.9608410Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9608977Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9608990Z 
2025-12-04T09:59:13.9609223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9609378Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9609538Z ======================= 1 failed, 26 deselected in 9.08s =======================
2025-12-04T09:59:13.9609621Z Got exit code 1
2025-12-04T09:59:13.9610120Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda
2025-12-04T09:59:13.9610486Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.9611073Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml
2025-12-04T09:59:13.9611251Z ============================= test session starts ==============================
2025-12-04T09:59:13.9611557Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9611650Z cachedir: .pytest_cache
2025-12-04T09:59:13.9612107Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9612213Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9612309Z configfile: pytest.ini
2025-12-04T09:59:13.9612783Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9612996Z collecting ... collected 60 items / 24 deselected / 36 selected
2025-12-04T09:59:13.9613124Z stepcurrent: skipping 24 already run items.
2025-12-04T09:59:13.9613223Z Running 3 items in this shard
2025-12-04T09:59:13.9613227Z 
2025-12-04T09:59:13.9614161Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:09.044000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78604
2025-12-04T09:59:13.9614610Z I1204 09:57:09.045000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78605
2025-12-04T09:59:13.9615044Z I1204 09:57:09.046000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78606
2025-12-04T09:59:13.9615480Z I1204 09:57:09.046000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78607
2025-12-04T09:59:13.9616659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9616965Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9618731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9618901Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9620147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9620275Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9622226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9622400Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9623635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9623762Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9625526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9625737Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9626969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9627101Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9628858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9629030Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9629489Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9630024Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9631037Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9631544Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9632552Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9633088Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9633957Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9634386Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9635238Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9635678Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9636524Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9636923Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9637777Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9638218Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9639724Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:13.9640076Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9640662Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9641674Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9642026Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9642664Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9643158Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9643558Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9644026Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9644926Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9645385Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9646287Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9646639Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9647493Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9647922Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9648777Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9649216Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9650059Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9650457Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9651309Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9651798Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9653273Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.9653601Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9654182Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9655207Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9655540Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9656171Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9656901Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9657355Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9657889Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9658892Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9659446Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9660445Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9660840Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9661804Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9662293Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9663247Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9663737Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9664691Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9665140Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9666160Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9666659Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9668317Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.9668825Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9669451Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9670515Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9670858Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9671533Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9672047Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9672470Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9672966Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9673939Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9674415Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9675344Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9675715Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9676757Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9677193Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9678039Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9678477Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9679326Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9679779Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9680634Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9681071Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9682547Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.9682899Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9683491Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9684491Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9684818Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9685453Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9685938Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9686033Z dist init r=1, world=4
2025-12-04T09:59:13.9686119Z dist init r=0, world=4
2025-12-04T09:59:13.9686208Z dist init r=3, world=4
2025-12-04T09:59:13.9686290Z dist init r=2, world=4
2025-12-04T09:59:13.9686399Z FAILED [8.7592s] [ 33%]
2025-12-04T09:59:13.9686404Z 
2025-12-04T09:59:13.9686541Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9686814Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___
2025-12-04T09:59:13.9686930Z Traceback (most recent call last):
2025-12-04T09:59:13.9687415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9687513Z     self._join_processes(fn)
2025-12-04T09:59:13.9688032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9688159Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9688694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9688800Z     raise RuntimeError(error)
2025-12-04T09:59:13.9689006Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.9689116Z Traceback (most recent call last):
2025-12-04T09:59:13.9689594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9689689Z     getattr(self, test_name)()
2025-12-04T09:59:13.9690166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9690242Z     fn()
2025-12-04T09:59:13.9690689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9690838Z     method(*args, **kwargs)
2025-12-04T09:59:13.9691284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9691381Z     method(*args, **kwargs)
2025-12-04T09:59:13.9691824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9691906Z     with policy():
2025-12-04T09:59:13.9692357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9692449Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9693529Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.9693563Z 
2025-12-04T09:59:13.9693753Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9694355Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9694362Z 
2025-12-04T09:59:13.9694602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9694607Z 
2025-12-04T09:59:13.9694611Z 
2025-12-04T09:59:13.9694802Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9695034Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9695743Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml -
2025-12-04T09:59:13.9695891Z =========================== short test summary info ============================
2025-12-04T09:59:13.9696878Z FAILED [8.7592s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:13.9697006Z Traceback (most recent call last):
2025-12-04T09:59:13.9697595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9697705Z     getattr(self, test_name)()
2025-12-04T09:59:13.9698236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9698327Z     fn()
2025-12-04T09:59:13.9698832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9698941Z     method(*args, **kwargs)
2025-12-04T09:59:13.9699440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9699548Z     method(*args, **kwargs)
2025-12-04T09:59:13.9700052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9700144Z     with policy():
2025-12-04T09:59:13.9700650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9700759Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9701966Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.9701974Z 
2025-12-04T09:59:13.9702190Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9702899Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9702933Z 
2025-12-04T09:59:13.9703199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9703377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9703549Z ======================= 1 failed, 24 deselected in 8.98s =======================
2025-12-04T09:59:13.9703649Z Got exit code 1
2025-12-04T09:59:13.9703750Z Retrying single test...
2025-12-04T09:59:13.9704372Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml
2025-12-04T09:59:13.9704581Z ============================= test session starts ==============================
2025-12-04T09:59:13.9704927Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9705037Z cachedir: .pytest_cache
2025-12-04T09:59:13.9705551Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9705671Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9705787Z configfile: pytest.ini
2025-12-04T09:59:13.9706323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9706535Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9707299Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9707410Z Running 1 items in this shard
2025-12-04T09:59:13.9707415Z 
2025-12-04T09:59:13.9708459Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:22.414000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78917
2025-12-04T09:59:13.9709165Z I1204 09:57:22.415000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78918
2025-12-04T09:59:13.9709642Z I1204 09:57:22.416000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78919
2025-12-04T09:59:13.9710073Z I1204 09:57:22.416000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78920
2025-12-04T09:59:13.9711177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9711294Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9712818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9712973Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9714068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9714187Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9715278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9715441Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9716962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9717107Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9718626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9718803Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9719902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9720009Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9721890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9722068Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9722533Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9723078Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9724136Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9724655Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9725646Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9726045Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9727011Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9727494Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9728460Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9728948Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9729951Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9730700Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9731670Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9732165Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9733920Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240.
2025-12-04T09:59:13.9734293Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9734876Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9735888Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9736205Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9737123Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9737685Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9738168Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9738706Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9739701Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9740215Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9741200Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9741593Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9742556Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9743037Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9743997Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9744502Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9745502Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9745945Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9746902Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9747429Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9749176Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:13.9749504Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9750081Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9751093Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9751411Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9752047Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9752559Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9752958Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9753436Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9754329Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9754785Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9755669Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9756016Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9756879Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9757307Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9758188Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9758719Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9759569Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9759965Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9760846Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9761294Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9762766Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.9763097Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9763677Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9764686Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9765008Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9765669Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9766161Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9766559Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9767033Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9767917Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9768374Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9769248Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9769597Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9770454Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9770935Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9771786Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9772218Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9773063Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9773482Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9774335Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9774776Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9776244Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.9776655Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9777481Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9778658Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9779020Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9779736Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9785700Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9785853Z dist init r=3, world=4
2025-12-04T09:59:13.9785951Z dist init r=1, world=4
2025-12-04T09:59:13.9786052Z dist init r=2, world=4
2025-12-04T09:59:13.9786150Z dist init r=0, world=4
2025-12-04T09:59:13.9786247Z FAILED [8.6879s] [100%]
2025-12-04T09:59:13.9786255Z 
2025-12-04T09:59:13.9786414Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9786731Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___
2025-12-04T09:59:13.9786850Z Traceback (most recent call last):
2025-12-04T09:59:13.9787415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9787524Z     self._join_processes(fn)
2025-12-04T09:59:13.9788115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9788256Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9788859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9789159Z     raise RuntimeError(error)
2025-12-04T09:59:13.9789399Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.9789504Z Traceback (most recent call last):
2025-12-04T09:59:13.9789997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9790095Z     getattr(self, test_name)()
2025-12-04T09:59:13.9790575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9790651Z     fn()
2025-12-04T09:59:13.9791100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9791226Z     method(*args, **kwargs)
2025-12-04T09:59:13.9791672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9791763Z     method(*args, **kwargs)
2025-12-04T09:59:13.9792209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9792299Z     with policy():
2025-12-04T09:59:13.9792758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9792849Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9793918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:13.9793927Z 
2025-12-04T09:59:13.9794122Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9794726Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9794735Z 
2025-12-04T09:59:13.9794974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9794979Z 
2025-12-04T09:59:13.9794983Z 
2025-12-04T09:59:13.9795209Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9795446Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9796151Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml -
2025-12-04T09:59:13.9796300Z =========================== short test summary info ============================
2025-12-04T09:59:13.9797056Z FAILED [8.6879s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:13.9797165Z Traceback (most recent call last):
2025-12-04T09:59:13.9797659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9797757Z     getattr(self, test_name)()
2025-12-04T09:59:13.9798233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9798315Z     fn()
2025-12-04T09:59:13.9798761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9798850Z     method(*args, **kwargs)
2025-12-04T09:59:13.9799298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9799392Z     method(*args, **kwargs)
2025-12-04T09:59:13.9799840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9799953Z     with policy():
2025-12-04T09:59:13.9800427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9800528Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9801600Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:13.9801605Z 
2025-12-04T09:59:13.9801800Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9802407Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9802441Z 
2025-12-04T09:59:13.9802675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9802842Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9802995Z ======================= 1 failed, 26 deselected in 8.90s =======================
2025-12-04T09:59:13.9803084Z Got exit code 1
2025-12-04T09:59:13.9803180Z Retrying single test...
2025-12-04T09:59:13.9803730Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml
2025-12-04T09:59:13.9803876Z ============================= test session starts ==============================
2025-12-04T09:59:13.9804180Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9804273Z cachedir: .pytest_cache
2025-12-04T09:59:13.9804732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9804837Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9804935Z configfile: pytest.ini
2025-12-04T09:59:13.9805408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9805597Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:13.9806323Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9806422Z Running 1 items in this shard
2025-12-04T09:59:13.9806427Z 
2025-12-04T09:59:13.9807361Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:35.884000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79230
2025-12-04T09:59:13.9807804Z I1204 09:57:35.885000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79231
2025-12-04T09:59:13.9808242Z I1204 09:57:35.886000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79232
2025-12-04T09:59:13.9808677Z I1204 09:57:35.886000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79233
2025-12-04T09:59:13.9809782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9809904Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9811432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9811644Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9812738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9812846Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9813951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9814087Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9815619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9815768Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9817676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9817846Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9819077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9819211Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9821211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9821393Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9821857Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9822401Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9823406Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9823913Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9824910Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9825312Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9826316Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9826845Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9827814Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9828298Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9829250Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9829745Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9830712Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9831205Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9832992Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240.
2025-12-04T09:59:13.9833349Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9833973Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9835065Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9835418Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9836090Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9836605Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9837030Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9837541Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9838675Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9839166Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9840135Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9840517Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9841621Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9842080Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9842991Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9843471Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9844373Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9844798Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9845706Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9846172Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9847740Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:13.9848093Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9848737Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9849905Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9850233Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9850866Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9851357Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9851752Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9852226Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9853111Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9853561Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9854473Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9854846Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9855705Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9856133Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9857281Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9857822Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9858783Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9859233Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9860195Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9860694Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9862361Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.9862761Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9863422Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9864549Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9864916Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9865635Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9866184Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9866631Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9867168Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9868168Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9868824Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9869827Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9870208Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9871134Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9871637Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9872565Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9873035Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9873962Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9874396Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9875325Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9875800Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9877452Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.9877804Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9878443Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9879548Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9879902Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9880695Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9881386Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9881491Z dist init r=2, world=4
2025-12-04T09:59:13.9881583Z dist init r=1, world=4
2025-12-04T09:59:13.9881680Z dist init r=0, world=4
2025-12-04T09:59:13.9881773Z dist init r=3, world=4
2025-12-04T09:59:13.9881861Z FAILED [8.7156s] [100%]
2025-12-04T09:59:13.9881867Z 
2025-12-04T09:59:13.9882014Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9882370Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___
2025-12-04T09:59:13.9882483Z Traceback (most recent call last):
2025-12-04T09:59:13.9883017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9883126Z     self._join_processes(fn)
2025-12-04T09:59:13.9883704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9883842Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9884429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9884576Z     raise RuntimeError(error)
2025-12-04T09:59:13.9884806Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9884918Z Traceback (most recent call last):
2025-12-04T09:59:13.9885451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9885555Z     getattr(self, test_name)()
2025-12-04T09:59:13.9886080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9886164Z     fn()
2025-12-04T09:59:13.9886656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9886761Z     method(*args, **kwargs)
2025-12-04T09:59:13.9887246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9887346Z     method(*args, **kwargs)
2025-12-04T09:59:13.9887836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9887925Z     with policy():
2025-12-04T09:59:13.9888433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9888534Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9889736Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240.
2025-12-04T09:59:13.9889747Z 
2025-12-04T09:59:13.9889953Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9890607Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9890615Z 
2025-12-04T09:59:13.9890880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9890885Z 
2025-12-04T09:59:13.9890892Z 
2025-12-04T09:59:13.9891107Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9891365Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9892145Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml -
2025-12-04T09:59:13.9892307Z =========================== short test summary info ============================
2025-12-04T09:59:13.9893124Z FAILED [8.7156s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:13.9893240Z Traceback (most recent call last):
2025-12-04T09:59:13.9893779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9893887Z     getattr(self, test_name)()
2025-12-04T09:59:13.9894462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9894553Z     fn()
2025-12-04T09:59:13.9895043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9895140Z     method(*args, **kwargs)
2025-12-04T09:59:13.9895630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9895727Z     method(*args, **kwargs)
2025-12-04T09:59:13.9896217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9896412Z     with policy():
2025-12-04T09:59:13.9897089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9897207Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9898422Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240.
2025-12-04T09:59:13.9898428Z 
2025-12-04T09:59:13.9898651Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9899332Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9899338Z 
2025-12-04T09:59:13.9899599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9899787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:13.9899960Z ======================= 1 failed, 26 deselected in 8.93s =======================
2025-12-04T09:59:13.9900062Z Got exit code 1
2025-12-04T09:59:13.9900666Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda
2025-12-04T09:59:13.9901109Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:13.9901745Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml
2025-12-04T09:59:13.9901904Z ============================= test session starts ==============================
2025-12-04T09:59:13.9902255Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:13.9902362Z cachedir: .pytest_cache
2025-12-04T09:59:13.9902871Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:13.9902997Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:13.9903103Z configfile: pytest.ini
2025-12-04T09:59:13.9903636Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:13.9903858Z collecting ... collected 60 items / 25 deselected / 35 selected
2025-12-04T09:59:13.9903999Z stepcurrent: skipping 25 already run items.
2025-12-04T09:59:13.9904110Z Running 2 items in this shard
2025-12-04T09:59:13.9904121Z 
2025-12-04T09:59:13.9905181Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:57:49.294000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79543
2025-12-04T09:59:13.9905677Z I1204 09:57:49.295000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79544
2025-12-04T09:59:13.9906174Z I1204 09:57:49.295000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79545
2025-12-04T09:59:13.9906731Z I1204 09:57:49.296000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79546
2025-12-04T09:59:13.9907991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9908117Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9909288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.9909480Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.9909760Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.9909961Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.9911490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9911642Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9912738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9912850Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9913737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.9913890Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.9914201Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.9914392Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.9915907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9916062Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9917159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9917281Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9918159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.9918308Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.9918591Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.9918778Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.9920327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9920497Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9922060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:13.9922186Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:13.9923244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:13.9923421Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:13.9923739Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:13.9923954Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:13.9925669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:13.9925831Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:13.9926295Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9926827Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9927839Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9928380Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9929372Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9929769Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9930731Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9931224Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9932180Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9932668Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9933825Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9934253Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9935236Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9935709Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9937557Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:13.9937965Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9938630Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9939761Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:13.9940126Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9940845Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9941390Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:13.9941847Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9942374Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9943412Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9943915Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9944911Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9945305Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9946264Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9946756Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9947717Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9948208Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9949393Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9949849Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9950751Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9951208Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9952800Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336.
2025-12-04T09:59:13.9953160Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9953750Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9954747Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:13.9955072Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9955705Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9956191Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:13.9956593Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9957085Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9957975Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9958428Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9959309Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9959665Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9960520Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9960950Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9961809Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9962276Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9963163Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9963554Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9964406Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9964864Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9966338Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:13.9966660Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9967242Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9968243Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:13.9968563Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9969205Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9969769Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:13.9970165Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:13.9970637Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:13.9971524Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9971982Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:13.9972858Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9973217Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:13.9974074Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9974504Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9975380Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9975834Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:13.9976955Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9977404Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:13.9978368Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9978895Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:13.9980552Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:13.9980920Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9981573Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9982709Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:13.9983071Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:13.9983816Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9984360Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:13.9984457Z dist init r=0, world=4
2025-12-04T09:59:13.9984558Z dist init r=2, world=4
2025-12-04T09:59:13.9984651Z dist init r=3, world=4
2025-12-04T09:59:13.9984745Z dist init r=1, world=4
2025-12-04T09:59:13.9984845Z FAILED [8.5663s] [ 50%]
2025-12-04T09:59:13.9984851Z 
2025-12-04T09:59:13.9985000Z =================================== FAILURES ===================================
2025-12-04T09:59:13.9985314Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___
2025-12-04T09:59:13.9985430Z Traceback (most recent call last):
2025-12-04T09:59:13.9985978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:13.9986093Z     self._join_processes(fn)
2025-12-04T09:59:13.9986678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:13.9986820Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:13.9987421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:13.9987532Z     raise RuntimeError(error)
2025-12-04T09:59:13.9987767Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9987882Z Traceback (most recent call last):
2025-12-04T09:59:13.9988474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9988591Z     getattr(self, test_name)()
2025-12-04T09:59:13.9989213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9989296Z     fn()
2025-12-04T09:59:13.9989745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9989835Z     method(*args, **kwargs)
2025-12-04T09:59:13.9990283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9990401Z     method(*args, **kwargs)
2025-12-04T09:59:13.9990846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9990934Z     with policy():
2025-12-04T09:59:13.9991552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9991661Z     raise RuntimeError(msg)
2025-12-04T09:59:13.9992791Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336.
2025-12-04T09:59:13.9992797Z 
2025-12-04T09:59:13.9992996Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:13.9993634Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:13.9993641Z 
2025-12-04T09:59:13.9993888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:13.9993893Z 
2025-12-04T09:59:13.9993900Z 
2025-12-04T09:59:13.9994108Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:13.9994352Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:13.9995135Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml -
2025-12-04T09:59:13.9995294Z =========================== short test summary info ============================
2025-12-04T09:59:13.9996078Z FAILED [8.5663s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T09:59:13.9996195Z Traceback (most recent call last):
2025-12-04T09:59:13.9996706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:13.9996812Z     getattr(self, test_name)()
2025-12-04T09:59:13.9997317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:13.9997397Z     fn()
2025-12-04T09:59:13.9997876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9997971Z     method(*args, **kwargs)
2025-12-04T09:59:13.9998441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:13.9998537Z     method(*args, **kwargs)
2025-12-04T09:59:13.9999007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:13.9999100Z     with policy():
2025-12-04T09:59:13.9999574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:13.9999698Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0000860Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336.
2025-12-04T09:59:14.0000867Z 
2025-12-04T09:59:14.0001068Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0001883Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0001890Z 
2025-12-04T09:59:14.0002144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0002349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0002520Z ======================= 1 failed, 25 deselected in 8.78s =======================
2025-12-04T09:59:14.0002728Z Got exit code 1
2025-12-04T09:59:14.0002832Z Retrying single test...
2025-12-04T09:59:14.0003415Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml
2025-12-04T09:59:14.0003568Z ============================= test session starts ==============================
2025-12-04T09:59:14.0003896Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0003996Z cachedir: .pytest_cache
2025-12-04T09:59:14.0004482Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0004600Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0004697Z configfile: pytest.ini
2025-12-04T09:59:14.0005206Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0005406Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:14.0006120Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0006236Z Running 1 items in this shard
2025-12-04T09:59:14.0006288Z 
2025-12-04T09:59:14.0007264Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:58:02.594000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79856
2025-12-04T09:59:14.0007739Z I1204 09:58:02.595000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79857
2025-12-04T09:59:14.0008203Z I1204 09:58:02.596000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79858
2025-12-04T09:59:14.0008661Z I1204 09:58:02.596000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79859
2025-12-04T09:59:14.0009840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0009957Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0010892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0011054Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0011354Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0011560Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0013509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0013700Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0014857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0015005Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0015936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0016099Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0016481Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0016690Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0018577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0018744Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0019982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0020112Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0021617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0021749Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0022735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0022916Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0023235Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0023450Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0025160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0025322Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0026315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0026485Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0026798Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0027089Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0028799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0028966Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0029426Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0030001Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0031010Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0031516Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0032508Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0033011Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0034063Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0034634Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0035507Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0035942Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0036785Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0037182Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0038035Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0038476Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0039949Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:14.0040274Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0040857Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0041907Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0042231Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0042861Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0043348Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:14.0043773Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0044239Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0045130Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0045575Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0046457Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0046805Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0047659Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0048117Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0048968Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0049404Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0050251Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0050649Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0051506Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0051939Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0053405Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240.
2025-12-04T09:59:14.0053732Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0054368Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0055365Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0055688Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0056383Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0057106Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:14.0057563Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0058092Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0059093Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0059596Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0060588Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0060984Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0061985Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0062468Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0063427Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0063917Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0064873Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0065325Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0066282Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0066773Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0068455Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336.
2025-12-04T09:59:14.0068848Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0069566Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0070571Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0070926Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0071558Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0072045Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:14.0072443Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0072909Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0073792Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0074242Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0075120Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0075493Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0076345Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0076775Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0077620Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0078060Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0078912Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0079310Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0080158Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0080595Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0082374Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:14.0082702Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0083282Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0084281Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0084641Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0085271Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0085760Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:14.0085849Z dist init r=1, world=4
2025-12-04T09:59:14.0085933Z dist init r=3, world=4
2025-12-04T09:59:14.0086019Z dist init r=2, world=4
2025-12-04T09:59:14.0086100Z dist init r=0, world=4
2025-12-04T09:59:14.0086183Z FAILED [8.7289s] [100%]
2025-12-04T09:59:14.0086191Z 
2025-12-04T09:59:14.0086326Z =================================== FAILURES ===================================
2025-12-04T09:59:14.0086594Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___
2025-12-04T09:59:14.0086705Z Traceback (most recent call last):
2025-12-04T09:59:14.0087195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:14.0087289Z     self._join_processes(fn)
2025-12-04T09:59:14.0087833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:14.0087955Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:14.0088486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:14.0088587Z     raise RuntimeError(error)
2025-12-04T09:59:14.0088789Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0088899Z Traceback (most recent call last):
2025-12-04T09:59:14.0089375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0089472Z     getattr(self, test_name)()
2025-12-04T09:59:14.0089947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0090024Z     fn()
2025-12-04T09:59:14.0090476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0090569Z     method(*args, **kwargs)
2025-12-04T09:59:14.0091013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0091105Z     method(*args, **kwargs)
2025-12-04T09:59:14.0091548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0091632Z     with policy():
2025-12-04T09:59:14.0092082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0092225Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0093300Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:14.0093305Z 
2025-12-04T09:59:14.0093493Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0094089Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0094098Z 
2025-12-04T09:59:14.0094360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0094365Z 
2025-12-04T09:59:14.0094369Z 
2025-12-04T09:59:14.0094563Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:14.0094800Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:14.0095501Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml -
2025-12-04T09:59:14.0095656Z =========================== short test summary info ============================
2025-12-04T09:59:14.0096481Z FAILED [8.7289s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0096592Z Traceback (most recent call last):
2025-12-04T09:59:14.0097310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0097424Z     getattr(self, test_name)()
2025-12-04T09:59:14.0097959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0098051Z     fn()
2025-12-04T09:59:14.0098557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0098667Z     method(*args, **kwargs)
2025-12-04T09:59:14.0099209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0099309Z     method(*args, **kwargs)
2025-12-04T09:59:14.0099813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0099906Z     with policy():
2025-12-04T09:59:14.0100410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0100524Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0101729Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336.
2025-12-04T09:59:14.0101737Z 
2025-12-04T09:59:14.0101956Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0102631Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0102636Z 
2025-12-04T09:59:14.0102900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0103074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0103246Z ======================= 1 failed, 26 deselected in 8.95s =======================
2025-12-04T09:59:14.0103344Z Got exit code 1
2025-12-04T09:59:14.0103445Z Retrying single test...
2025-12-04T09:59:14.0104094Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml
2025-12-04T09:59:14.0104304Z ============================= test session starts ==============================
2025-12-04T09:59:14.0104651Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0104760Z cachedir: .pytest_cache
2025-12-04T09:59:14.0105274Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0105392Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0105499Z configfile: pytest.ini
2025-12-04T09:59:14.0106031Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0106272Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:14.0107041Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0107154Z Running 1 items in this shard
2025-12-04T09:59:14.0107160Z 
2025-12-04T09:59:14.0108213Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:58:16.043000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80169
2025-12-04T09:59:14.0108818Z I1204 09:58:16.044000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80170
2025-12-04T09:59:14.0109391Z I1204 09:58:16.045000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80171
2025-12-04T09:59:14.0109829Z I1204 09:58:16.046000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80172
2025-12-04T09:59:14.0110930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0111046Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0111957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0112114Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0112392Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0112580Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0114102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0114248Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0115351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0115460Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0116353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0116508Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0116840Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0117030Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0118548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0118698Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0119792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0119932Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0121352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T09:59:14.0121482Z   self.encoder = TransformerEncoder(
2025-12-04T09:59:14.0122480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0122652Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0122974Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0123190Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0124980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0125154Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0126140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T09:59:14.0126316Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T09:59:14.0126629Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T09:59:14.0126838Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T09:59:14.0128570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0128731Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0129194Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0129724Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0130724Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0131309Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0132297Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0132695Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0133754Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0134230Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0135081Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0135509Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0136420Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0137006Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0137986Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0138477Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0140165Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240.
2025-12-04T09:59:14.0140526Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0141187Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0142318Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0142680Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0143394Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0143935Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:14.0144389Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0144921Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0145971Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0146484Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0147468Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0147865Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0149066Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0149538Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0150436Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0150890Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0151794Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0152214Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0153125Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0153616Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0155181Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336.
2025-12-04T09:59:14.0155524Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0156144Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0157210Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0157545Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0158223Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0158733Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:14.0159157Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0159728Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0160674Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0161154Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0162080Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0162479Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0163387Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0163848Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0164848Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0165277Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0166135Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0166526Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0167592Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0168058Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0169617Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336.
2025-12-04T09:59:14.0169965Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0170583Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0171646Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0171983Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0172660Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0173221Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:14.0173645Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0174147Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0175090Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0175744Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0177050Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0177461Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0178425Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0178911Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0179867Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0180351Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0181315Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0181793Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0182764Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0183248Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0184913Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336.
2025-12-04T09:59:14.0185278Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0185937Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0187065Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0187428Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0188203Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0188747Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:14.0188853Z dist init r=0, world=4
2025-12-04T09:59:14.0188947Z dist init r=1, world=4
2025-12-04T09:59:14.0189151Z dist init r=2, world=4
2025-12-04T09:59:14.0189252Z dist init r=3, world=4
2025-12-04T09:59:14.0189342Z FAILED [8.5884s] [100%]
2025-12-04T09:59:14.0189348Z 
2025-12-04T09:59:14.0189490Z =================================== FAILURES ===================================
2025-12-04T09:59:14.0189819Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___
2025-12-04T09:59:14.0189933Z Traceback (most recent call last):
2025-12-04T09:59:14.0190666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:14.0190769Z     self._join_processes(fn)
2025-12-04T09:59:14.0191289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:14.0191419Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:14.0191952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:14.0192049Z     raise RuntimeError(error)
2025-12-04T09:59:14.0192257Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:14.0192362Z Traceback (most recent call last):
2025-12-04T09:59:14.0192846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0192940Z     getattr(self, test_name)()
2025-12-04T09:59:14.0193413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0193498Z     fn()
2025-12-04T09:59:14.0194128Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0194255Z     method(*args, **kwargs)
2025-12-04T09:59:14.0194730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0194825Z     method(*args, **kwargs)
2025-12-04T09:59:14.0195298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0195385Z     with policy():
2025-12-04T09:59:14.0195863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0195965Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0197095Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336.
2025-12-04T09:59:14.0197103Z 
2025-12-04T09:59:14.0197311Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0197946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0197951Z 
2025-12-04T09:59:14.0198197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0198202Z 
2025-12-04T09:59:14.0198214Z 
2025-12-04T09:59:14.0198417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:14.0198660Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:14.0199468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml -
2025-12-04T09:59:14.0199626Z =========================== short test summary info ============================
2025-12-04T09:59:14.0200409Z FAILED [8.5884s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T09:59:14.0200527Z Traceback (most recent call last):
2025-12-04T09:59:14.0201045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0201155Z     getattr(self, test_name)()
2025-12-04T09:59:14.0201685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0201766Z     fn()
2025-12-04T09:59:14.0202246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0202345Z     method(*args, **kwargs)
2025-12-04T09:59:14.0202823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0202918Z     method(*args, **kwargs)
2025-12-04T09:59:14.0203390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0203484Z     with policy():
2025-12-04T09:59:14.0203960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0204060Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0205207Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336.
2025-12-04T09:59:14.0205216Z 
2025-12-04T09:59:14.0205415Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0206170Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0206177Z 
2025-12-04T09:59:14.0206408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0206570Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0206724Z ======================= 1 failed, 26 deselected in 8.80s =======================
2025-12-04T09:59:14.0206808Z Got exit code 1
2025-12-04T09:59:14.0207342Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda
2025-12-04T09:59:14.0207699Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:14.0208250Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml
2025-12-04T09:59:14.0208400Z ============================= test session starts ==============================
2025-12-04T09:59:14.0208707Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0208804Z cachedir: .pytest_cache
2025-12-04T09:59:14.0209254Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0209361Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0209458Z configfile: pytest.ini
2025-12-04T09:59:14.0209933Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0210118Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:14.0210297Z stepcurrent: skipping 26 already run items.
2025-12-04T09:59:14.0210396Z Running 1 items in this shard
2025-12-04T09:59:14.0210401Z 
2025-12-04T09:59:14.0211251Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:29.433000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80482
2025-12-04T09:59:14.0211692Z I1204 09:58:29.434000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80483
2025-12-04T09:59:14.0212124Z I1204 09:58:29.435000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80484
2025-12-04T09:59:14.0212586Z I1204 09:58:29.436000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80485
2025-12-04T09:59:14.0214122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0214275Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0215787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0215938Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0217795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0218006Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0219715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0219885Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0221110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:14.0221233Z   return func(*args, **kwargs)
2025-12-04T09:59:14.0221702Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0222240Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0223248Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0223763Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0225496Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0227082Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0228573Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0230157Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0231750Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0233507Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0235002Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0236446Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0237891Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0239383Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0241457Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064.
2025-12-04T09:59:14.0243425Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0244521Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0246220Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0247647Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0248793Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0250101Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:14.0251161Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0252219Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0254000Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0255618Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0257498Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0259012Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0260508Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0262134Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0263727Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0265310Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0266892Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0268433Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0270052Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0271591Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0273744Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0275722Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0276927Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0278694Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0280044Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0281121Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0282346Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:14.0283339Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0284327Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0285835Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0287301Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0288749Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0290094Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0291413Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0292865Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0294262Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0295664Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0297367Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0298905Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0300461Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0302047Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0304271Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160.
2025-12-04T09:59:14.0306335Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0307496Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0309372Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0310716Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0311789Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0313023Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:14.0314032Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0315018Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0316550Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0318005Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0319460Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0320938Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0322632Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0324222Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0325804Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0327382Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0328964Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0330495Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0332043Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0333785Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0335739Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0337910Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0339057Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0340872Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0342382Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0343593Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0344989Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:14.0345767Z dist init r=3, world=4
2025-12-04T09:59:14.0346036Z dist init r=2, world=4
2025-12-04T09:59:14.0346352Z dist init r=1, world=4
2025-12-04T09:59:14.0346648Z dist init r=0, world=4
2025-12-04T09:59:14.0347978Z [rank0]:[W1204 09:58:37.063140663 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:14.0349418Z FAILED [9.6294s] [100%]
2025-12-04T09:59:14.0349571Z 
2025-12-04T09:59:14.0349705Z =================================== FAILURES ===================================
2025-12-04T09:59:14.0350208Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________
2025-12-04T09:59:14.0350670Z Traceback (most recent call last):
2025-12-04T09:59:14.0351397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:14.0352092Z     self._join_processes(fn)
2025-12-04T09:59:14.0352793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:14.0353549Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:14.0354323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:14.0355089Z     raise RuntimeError(error)
2025-12-04T09:59:14.0355472Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0355907Z Traceback (most recent call last):
2025-12-04T09:59:14.0356594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0357297Z     getattr(self, test_name)()
2025-12-04T09:59:14.0357947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0358628Z     fn()
2025-12-04T09:59:14.0359198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0359857Z     method(*args, **kwargs)
2025-12-04T09:59:14.0360512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0361180Z     method(*args, **kwargs)
2025-12-04T09:59:14.0361798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0362444Z     with policy():
2025-12-04T09:59:14.0363046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0363716Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0364868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0365974Z 
2025-12-04T09:59:14.0366164Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0366976Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0367590Z 
2025-12-04T09:59:14.0367830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0368183Z 
2025-12-04T09:59:14.0368188Z 
2025-12-04T09:59:14.0368388Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:14.0368923Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:14.0369980Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml -
2025-12-04T09:59:14.0370953Z =========================== short test summary info ============================
2025-12-04T09:59:14.0371950Z FAILED [9.6294s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0372823Z Traceback (most recent call last):
2025-12-04T09:59:14.0373513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0374211Z     getattr(self, test_name)()
2025-12-04T09:59:14.0374872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0375544Z     fn()
2025-12-04T09:59:14.0376134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0377083Z     method(*args, **kwargs)
2025-12-04T09:59:14.0377785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0378526Z     method(*args, **kwargs)
2025-12-04T09:59:14.0379225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0379964Z     with policy():
2025-12-04T09:59:14.0380629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0381379Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0382684Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0383926Z 
2025-12-04T09:59:14.0384144Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0385049Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0385751Z 
2025-12-04T09:59:14.0386016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0386634Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0387125Z ======================= 1 failed, 26 deselected in 9.85s =======================
2025-12-04T09:59:14.0387530Z Got exit code 1
2025-12-04T09:59:14.0387787Z Retrying single test...
2025-12-04T09:59:14.0388586Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml
2025-12-04T09:59:14.0389541Z ============================= test session starts ==============================
2025-12-04T09:59:14.0390116Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0390633Z cachedir: .pytest_cache
2025-12-04T09:59:14.0391252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0391928Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0392229Z configfile: pytest.ini
2025-12-04T09:59:14.0392860Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0393635Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:14.0394522Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0395309Z Running 1 items in this shard
2025-12-04T09:59:14.0395493Z 
2025-12-04T09:59:14.0396339Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:43.893000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80819
2025-12-04T09:59:14.0397796Z I1204 09:58:43.894000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80820
2025-12-04T09:59:14.0398798Z I1204 09:58:43.895000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80821
2025-12-04T09:59:14.0399791Z I1204 09:58:43.896000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80822
2025-12-04T09:59:14.0401881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0403689Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0405461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0407230Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0409001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0410763Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0412559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0414332Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0415461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:14.0416649Z   return func(*args, **kwargs)
2025-12-04T09:59:14.0417486Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0418616Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0420276Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0422134Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0423775Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0425292Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0426858Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0428478Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0430067Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0431641Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0433284Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0434665Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0436035Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0437438Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0439388Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:14.0441216Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0442244Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0443894Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0445242Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0446322Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0447561Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:14.0448561Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0449549Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0451030Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0452489Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0453936Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0455276Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0456899Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0458491Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0460080Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0461698Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0463279Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0464824Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0466372Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0467961Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0470232Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0472062Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0473132Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0474747Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0476091Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0477167Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0478402Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:14.0479404Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0480396Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0481873Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0483315Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0484785Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0486153Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0487485Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0488881Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0490279Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0491705Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0493359Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0494811Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0496256Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0498055Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0500247Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:14.0502329Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0503493Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0505304Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0506806Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0508019Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0509570Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:14.0510574Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0511566Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0513047Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0514503Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0516022Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0517379Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0518692Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0520095Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0521916Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0523514Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0525107Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0526639Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0528191Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0529791Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0532051Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:14.0534175Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0535195Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0537064Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0538591Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0539816Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0541207Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:14.0541987Z dist init r=1, world=4
2025-12-04T09:59:14.0542257Z dist init r=2, world=4
2025-12-04T09:59:14.0542529Z dist init r=0, world=4
2025-12-04T09:59:14.0542783Z dist init r=3, world=4
2025-12-04T09:59:14.0544107Z [rank0]:[W1204 09:58:51.540419531 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:14.0545545Z FAILED [9.7900s] [100%]
2025-12-04T09:59:14.0545715Z 
2025-12-04T09:59:14.0545906Z =================================== FAILURES ===================================
2025-12-04T09:59:14.0546464Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________
2025-12-04T09:59:14.0546988Z Traceback (most recent call last):
2025-12-04T09:59:14.0547768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:14.0548544Z     self._join_processes(fn)
2025-12-04T09:59:14.0549396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:14.0550196Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:14.0550977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:14.0551729Z     raise RuntimeError(error)
2025-12-04T09:59:14.0552127Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:14.0552551Z Traceback (most recent call last):
2025-12-04T09:59:14.0553228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0553925Z     getattr(self, test_name)()
2025-12-04T09:59:14.0554581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0555259Z     fn()
2025-12-04T09:59:14.0555816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0556486Z     method(*args, **kwargs)
2025-12-04T09:59:14.0557109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0557761Z     method(*args, **kwargs)
2025-12-04T09:59:14.0558386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0559038Z     with policy():
2025-12-04T09:59:14.0559661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0560324Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0561482Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:14.0562581Z 
2025-12-04T09:59:14.0562770Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0563577Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0564201Z 
2025-12-04T09:59:14.0564435Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0564793Z 
2025-12-04T09:59:14.0564936Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:14.0565296Z Traceback (most recent call last):
2025-12-04T09:59:14.0565992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0566684Z     getattr(self, test_name)()
2025-12-04T09:59:14.0567344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0568011Z     fn()
2025-12-04T09:59:14.0568568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0569230Z     method(*args, **kwargs)
2025-12-04T09:59:14.0569851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0570638Z     method(*args, **kwargs)
2025-12-04T09:59:14.0571248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0571903Z     with policy():
2025-12-04T09:59:14.0572499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0573166Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0574312Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0575441Z 
2025-12-04T09:59:14.0575627Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0576532Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0577396Z 
2025-12-04T09:59:14.0577663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0578059Z 
2025-12-04T09:59:14.0578063Z 
2025-12-04T09:59:14.0578286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:14.0578895Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:14.0580078Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml -
2025-12-04T09:59:14.0581185Z =========================== short test summary info ============================
2025-12-04T09:59:14.0582231Z FAILED [9.7900s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T09:59:14.0583211Z Traceback (most recent call last):
2025-12-04T09:59:14.0583993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0584782Z     getattr(self, test_name)()
2025-12-04T09:59:14.0585557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0586316Z     fn()
2025-12-04T09:59:14.0586954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0587693Z     method(*args, **kwargs)
2025-12-04T09:59:14.0588393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0589225Z     method(*args, **kwargs)
2025-12-04T09:59:14.0589839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0590485Z     with policy():
2025-12-04T09:59:14.0591079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0591750Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0592907Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064.
2025-12-04T09:59:14.0594001Z 
2025-12-04T09:59:14.0594189Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0594997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0595626Z 
2025-12-04T09:59:14.0595859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0596209Z 
2025-12-04T09:59:14.0596388Z Process 2 exited with error code 10 and exception:
2025-12-04T09:59:14.0596767Z Traceback (most recent call last):
2025-12-04T09:59:14.0597453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0598155Z     getattr(self, test_name)()
2025-12-04T09:59:14.0598804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0599479Z     fn()
2025-12-04T09:59:14.0600041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0600701Z     method(*args, **kwargs)
2025-12-04T09:59:14.0601340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0602000Z     method(*args, **kwargs)
2025-12-04T09:59:14.0602624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0603277Z     with policy():
2025-12-04T09:59:14.0603859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0604529Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0605681Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0606770Z 
2025-12-04T09:59:14.0606962Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0607761Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0608387Z 
2025-12-04T09:59:14.0608619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0609134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0609565Z ====================== 1 failed, 26 deselected in 10.01s =======================
2025-12-04T09:59:14.0609930Z Got exit code 1
2025-12-04T09:59:14.0610203Z Retrying single test...
2025-12-04T09:59:14.0610918Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml
2025-12-04T09:59:14.0611721Z ============================= test session starts ==============================
2025-12-04T09:59:14.0612291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0612810Z cachedir: .pytest_cache
2025-12-04T09:59:14.0613419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0614093Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0614389Z configfile: pytest.ini
2025-12-04T09:59:14.0615019Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0615792Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T09:59:14.0616935Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0617828Z Running 1 items in this shard
2025-12-04T09:59:14.0618030Z 
2025-12-04T09:59:14.0618979Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:58.174000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 81156
2025-12-04T09:59:14.0620538Z I1204 09:58:58.175000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 81157
2025-12-04T09:59:14.0621969Z I1204 09:58:58.176000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 81158
2025-12-04T09:59:14.0623095Z I1204 09:58:58.176000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 81159
2025-12-04T09:59:14.0625451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0627458Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0629503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0631507Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0633540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0635303Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0637080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T09:59:14.0638850Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T09:59:14.0640028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T09:59:14.0641126Z   return func(*args, **kwargs)
2025-12-04T09:59:14.0641714Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0642712Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0644187Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0645649Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0647093Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0648437Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0649767Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0651291Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0659403Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0661025Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0662616Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0664263Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0665815Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0667405Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0669734Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064.
2025-12-04T09:59:14.0671561Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0672585Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0674377Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0675842Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0676979Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0678289Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T09:59:14.0679352Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0680407Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0681968Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0683722Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0685306Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0686779Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0688260Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0689819Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0691356Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0692884Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0694407Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0695946Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0697767Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0699351Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0701540Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160.
2025-12-04T09:59:14.0703585Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0704741Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0706584Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0708212Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0709395Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0710737Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T09:59:14.0711910Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0712967Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0714532Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0716074Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0717683Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0719032Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0720415Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0722269Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0723860Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0725432Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0727097Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0728629Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0730178Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0731764Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0734019Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0735840Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0737188Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0739009Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0740521Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0741734Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0743125Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T09:59:14.0744257Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T09:59:14.0745371Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T09:59:14.0747042Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0748669Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T09:59:14.0750256Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0751677Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T09:59:14.0753008Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0754408Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0755808Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0757245Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T09:59:14.0758649Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0760021Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T09:59:14.0761387Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0762790Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T09:59:14.0764734Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160.
2025-12-04T09:59:14.0766583Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0767619Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0769234Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0770572Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T09:59:14.0771653Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0772891Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T09:59:14.0773585Z dist init r=0, world=4
2025-12-04T09:59:14.0773822Z dist init r=3, world=4
2025-12-04T09:59:14.0774060Z dist init r=1, world=4
2025-12-04T09:59:14.0774291Z dist init r=2, world=4
2025-12-04T09:59:14.0775463Z [rank0]:[W1204 09:59:06.761351963 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T09:59:14.0776936Z FAILED [9.8205s] [100%]
2025-12-04T09:59:14.0777117Z 
2025-12-04T09:59:14.0777264Z =================================== FAILURES ===================================
2025-12-04T09:59:14.0777865Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________
2025-12-04T09:59:14.0778409Z Traceback (most recent call last):
2025-12-04T09:59:14.0779186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T09:59:14.0779971Z     self._join_processes(fn)
2025-12-04T09:59:14.0780758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T09:59:14.0781606Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T09:59:14.0782476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T09:59:14.0783353Z     raise RuntimeError(error)
2025-12-04T09:59:14.0783787Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0784260Z Traceback (most recent call last):
2025-12-04T09:59:14.0785039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0785824Z     getattr(self, test_name)()
2025-12-04T09:59:14.0786560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0787316Z     fn()
2025-12-04T09:59:14.0787947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0788805Z     method(*args, **kwargs)
2025-12-04T09:59:14.0789552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0790218Z     method(*args, **kwargs)
2025-12-04T09:59:14.0790840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0791487Z     with policy():
2025-12-04T09:59:14.0792086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0792759Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0793949Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0795048Z 
2025-12-04T09:59:14.0795241Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0796050Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0796673Z 
2025-12-04T09:59:14.0796906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0797260Z 
2025-12-04T09:59:14.0797264Z 
2025-12-04T09:59:14.0797464Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:59:14.0798015Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T09:59:14.0799074Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml -
2025-12-04T09:59:14.0800058Z =========================== short test summary info ============================
2025-12-04T09:59:14.0800999Z FAILED [9.8205s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T09:59:14.0801869Z Traceback (most recent call last):
2025-12-04T09:59:14.0802773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T09:59:14.0803513Z     getattr(self, test_name)()
2025-12-04T09:59:14.0804248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T09:59:14.0805007Z     fn()
2025-12-04T09:59:14.0805606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0806308Z     method(*args, **kwargs)
2025-12-04T09:59:14.0806966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T09:59:14.0807659Z     method(*args, **kwargs)
2025-12-04T09:59:14.0808312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T09:59:14.0809001Z     with policy():
2025-12-04T09:59:14.0809508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T09:59:14.0809609Z     raise RuntimeError(msg)
2025-12-04T09:59:14.0810658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160.
2025-12-04T09:59:14.0810666Z 
2025-12-04T09:59:14.0810866Z To execute this test, run the following from the base repo dir:
2025-12-04T09:59:14.0811409Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0811415Z 
2025-12-04T09:59:14.0811662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:59:14.0811830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:59:14.0811999Z ====================== 1 failed, 26 deselected in 10.04s =======================
2025-12-04T09:59:14.0812086Z Got exit code 1
2025-12-04T09:59:14.0812554Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda
2025-12-04T09:59:14.0812936Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:59:14.0813548Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml
2025-12-04T09:59:14.0813708Z ============================= test session starts ==============================
2025-12-04T09:59:14.0814033Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:59:14.0814137Z cachedir: .pytest_cache
2025-12-04T09:59:14.0814613Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:59:14.0814726Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:59:14.0814834Z configfile: pytest.ini
2025-12-04T09:59:14.0815332Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:59:14.0815536Z collecting ... collected 60 items / 27 deselected / 33 selected
2025-12-04T09:59:14.0815672Z stepcurrent: skipping 27 already run items.
2025-12-04T09:59:14.0815774Z Running 0 items in this shard
2025-12-04T09:59:14.0815779Z 
2025-12-04T09:59:14.0816627Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml -
2025-12-04T09:59:14.0816965Z ============================ 27 deselected in 0.02s ============================
2025-12-04T09:59:14.0833570Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda']
2025-12-04T09:59:14.0833668Z 
2025-12-04T09:59:14.0834247Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 2/2 (test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log)
2025-12-04T09:59:14.0834260Z 
2025-12-04T09:59:14.0834612Z Finished distributed/fsdp/test_fsdp_core 2/2 ... [2025-12-04 09:59:13.015565][3984.623477463], took 30.45min
2025-12-04T09:59:14.0835414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml
2025-12-04T09:59:14.0836217Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml
2025-12-04T09:59:14.0837006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml
2025-12-04T09:59:14.0838025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml
2025-12-04T09:59:14.0838870Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml
2025-12-04T09:59:14.0839686Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml
2025-12-04T09:59:14.0840501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml
2025-12-04T09:59:14.0841454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml
2025-12-04T09:59:14.0842260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml
2025-12-04T09:59:14.0843053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml
2025-12-04T09:59:14.0843843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml
2025-12-04T09:59:14.0844631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml
2025-12-04T09:59:14.0845418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml
2025-12-04T09:59:14.0846215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml
2025-12-04T09:59:14.0847035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml
2025-12-04T09:59:14.0847829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml
2025-12-04T09:59:14.0848612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml
2025-12-04T09:59:14.0849410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml
2025-12-04T09:59:14.0850287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml
2025-12-04T09:59:14.0851035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml
2025-12-04T09:59:14.0851789Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml
2025-12-04T09:59:14.0852535Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml
2025-12-04T09:59:14.0853281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml
2025-12-04T09:59:14.0854083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml
2025-12-04T09:59:14.0854836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml
2025-12-04T09:59:14.0855603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml
2025-12-04T09:59:14.0856417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml
2025-12-04T09:59:14.0857439Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml
2025-12-04T09:59:14.0858290Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml
2025-12-04T09:59:14.1017508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml
2025-12-04T09:59:14.1323304Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml
2025-12-04T09:59:14.1659743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml
2025-12-04T09:59:14.2195397Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml
2025-12-04T09:59:14.2500022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml
2025-12-04T09:59:14.2869879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml
2025-12-04T09:59:14.3137792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml
2025-12-04T09:59:14.3421883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml
2025-12-04T09:59:14.3741107Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml
2025-12-04T09:59:14.4017764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml
2025-12-04T09:59:14.4315424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml
2025-12-04T09:59:14.4588168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml
2025-12-04T09:59:14.4915668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml
2025-12-04T09:59:14.5484276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml
2025-12-04T09:59:14.5813072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml
2025-12-04T09:59:14.6140843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml
2025-12-04T09:59:14.6436680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml
2025-12-04T09:59:14.6819748Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml
2025-12-04T09:59:14.7156304Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml
2025-12-04T09:59:14.7466922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml
2025-12-04T09:59:14.7782957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml
2025-12-04T09:59:14.8092315Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml
2025-12-04T09:59:14.8552606Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml
2025-12-04T09:59:14.8912824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml
2025-12-04T09:59:14.9316356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml
2025-12-04T09:59:14.9674320Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml
2025-12-04T09:59:15.0020386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml
2025-12-04T09:59:15.0398611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml
2025-12-04T09:59:15.0734937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml
2025-12-04T09:59:15.1099730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml
2025-12-04T09:59:15.1375031Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml
2025-12-04T09:59:15.1677054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml
2025-12-04T09:59:15.1973921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml
2025-12-04T09:59:15.2301640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml
2025-12-04T09:59:15.2641432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml
2025-12-04T09:59:15.2994812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml
2025-12-04T09:59:15.3301589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml
2025-12-04T09:59:15.3634400Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml
2025-12-04T09:59:15.3923946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml
2025-12-04T09:59:15.4216581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml
2025-12-04T09:59:15.4479077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml
2025-12-04T09:59:15.4807552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml
2025-12-04T09:59:15.5098742Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml
2025-12-04T09:59:15.5393137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml
2025-12-04T09:59:15.5685612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml
2025-12-04T09:59:15.6056029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml
2025-12-04T09:59:15.6374737Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml
2025-12-04T09:59:15.6644452Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml
2025-12-04T09:59:15.6968210Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml
2025-12-04T09:59:15.7295041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml
2025-12-04T09:59:15.7659863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml
2025-12-04T09:59:15.7911959Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml
2025-12-04T09:59:15.8220099Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml
2025-12-04T09:59:16.2357690Z Uploading logs for 57116084904 to S3
2025-12-04T09:59:16.3394903Z Uploading artifacts took 0.49 seconds
2025-12-04T09:59:16.3395333Z distributed/fsdp/test_fsdp_core 2/2 failed!
2025-12-04T09:59:16.3396294Z Running distributed/algorithms/test_join 1/1 ... [2025-12-04 09:59:16.339489][3987.947405768]
2025-12-04T09:59:16.3396936Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:59:16.3399995Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/algorithms/test_join.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:59:16.339819]
2025-12-04T10:00:12.6004999Z 
2025-12-04T10:00:12.6006110Z distributed/algorithms/test_join 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.algorithms.test_join_1.1_8f0ad2e1263a10f0_.log
2025-12-04T10:00:12.6010923Z Running 9 items in this shard: test/distributed/algorithms/test_join.py::TestJoin::test_join_kwargs, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables_throw, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_main_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_post_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_throw
2025-12-04T10:00:12.6014659Z 
2025-12-04T10:00:12.6015066Z Finished distributed/algorithms/test_join 1/1 ... [2025-12-04 10:00:12.600102][4044.208018492], took 0.94min
2025-12-04T10:00:12.6186394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.xml
2025-12-04T10:00:12.7037445Z Running distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:00:12.703118][4044.311035497]
2025-12-04T10:00:12.7038172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:00:12.7039683Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_schedule_multiproc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:00:12.703467]
2025-12-04T10:00:33.3692728Z 
2025-12-04T10:00:33.3694062Z distributed/pipelining/test_schedule_multiproc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_3173a38c7a75b752_.log
2025-12-04T10:00:33.3715676Z Running 34 items in this shard: test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_custom_function_callback, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_forward_only_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_schedule_with_weight_update_mlp_e2e_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_pipeline_schedule_runtime_custom_sched_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_schedule_with_native_zero_bubble_ScheduleClass0
2025-12-04T10:00:33.3737412Z 
2025-12-04T10:00:33.3737905Z Finished distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:00:33.368878][4064.976794701], took 0.34min
2025-12-04T10:00:33.3876869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.xml
2025-12-04T10:00:33.4722955Z Running distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:00:33.471636][4065.079554714]
2025-12-04T10:00:33.4723621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:00:33.4724962Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_compute_comm_reordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:00:33.471972]
2025-12-04T10:02:53.3819012Z 
2025-12-04T10:02:53.3820254Z distributed/test_compute_comm_reordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_compute_comm_reordering_1.1_7c582fe21d8b6d0b_.log
2025-12-04T10:02:53.3828370Z Running 9 items in this shard: test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_False, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_True, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_inductor_default_comms_ordering, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_nccl_heuristics, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_raise_comms, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap_custom_runtime_estimation, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms
2025-12-04T10:02:53.3834730Z 
2025-12-04T10:02:53.3835147Z Finished distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:02:53.381486][4204.98940222], took 2.33min
2025-12-04T10:02:53.4000473Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.xml
2025-12-04T10:02:53.4869410Z Running distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 10:02:53.486196][4205.094114024]
2025-12-04T10:02:53.4870038Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:02:53.4871273Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_cupy_as_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:53.486557]
2025-12-04T10:02:57.2605297Z 
2025-12-04T10:02:57.2606438Z distributed/test_cupy_as_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_cupy_as_tensor_1.1_01ccc395c80cccfc_.log
2025-12-04T10:02:57.2607827Z Running 1 items in this shard: test/distributed/test_cupy_as_tensor.py::CupyAsTensorTest::test_cupy_as_tensor
2025-12-04T10:02:57.2608396Z 
2025-12-04T10:02:57.2609085Z Finished distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 10:02:57.259795][4208.867711215], took 0.06min
2025-12-04T10:02:57.2779914Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.xml
2025-12-04T10:02:57.3104581Z Running distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 10:02:57.310235][4208.918152255]
2025-12-04T10:02:57.3105193Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:02:57.3107721Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:57.310565]
2025-12-04T10:03:02.3880415Z 
2025-12-04T10:03:02.3881567Z distributed/fsdp/test_fsdp_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_fx_1.1_5233411b5b9ade93_.log
2025-12-04T10:03:02.3883012Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_fx.py::TestSymbolicTracingCUDA::test_symbolic_tracing_outputs_cuda
2025-12-04T10:03:02.3883670Z 
2025-12-04T10:03:02.3884045Z Finished distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 10:03:02.387460][4213.995376468], took 0.08min
2025-12-04T10:03:02.4059627Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.xml
2025-12-04T10:03:02.4397680Z Running distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 10:03:02.439140][4214.047057176]
2025-12-04T10:03:02.4398292Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:02.4399830Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_sac_ilp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:02.439504]
2025-12-04T10:03:14.4828329Z 
2025-12-04T10:03:14.4829466Z distributed/_tools/test_sac_ilp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_sac_ilp_1.1_aac1d3e83d5577ad_.log
2025-12-04T10:03:14.4831989Z Running 4 items in this shard: test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case1, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case2, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case3, test/distributed/_tools/test_sac_ilp.py::TestOptimalCheckpointingPolicy::test_get_optimial_checkpointing_policy_per_module
2025-12-04T10:03:14.4834016Z 
2025-12-04T10:03:14.4834401Z Finished distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 10:03:14.482447][4226.090356853], took 0.20min
2025-12-04T10:03:14.5012010Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.xml
2025-12-04T10:03:14.5973849Z Running distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 10:03:14.596734][4226.204651502]
2025-12-04T10:03:14.5974489Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:14.5975782Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_hf_storage.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:14.597097]
2025-12-04T10:03:18.8731315Z 
2025-12-04T10:03:18.8732510Z distributed/checkpoint/test_hf_storage 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hf_storage_1.1_ec1da04f72df0c46_.log
2025-12-04T10:03:18.8735928Z Running 5 items in this shard: test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_metadata_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_with_sharding, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_metadata_hf
2025-12-04T10:03:18.8738593Z 
2025-12-04T10:03:18.8739028Z Finished distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 10:03:18.872644][4230.480560583], took 0.07min
2025-12-04T10:03:18.8915727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.xml
2025-12-04T10:03:18.9271432Z Running distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 10:03:18.926521][4230.534438755]
2025-12-04T10:03:18.9272102Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:18.9273406Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_microbatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:18.926881]
2025-12-04T10:03:37.4859206Z 
2025-12-04T10:03:37.4860440Z distributed/pipelining/test_microbatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_microbatch_1.1_e0b58af1802f4b06_.log
2025-12-04T10:03:37.4864445Z Running 5 items in this shard: test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_chunk_spec_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_and_merge_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_batch_size_one_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_none_cuda
2025-12-04T10:03:37.4867262Z 
2025-12-04T10:03:37.4867700Z Finished distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 10:03:37.485542][4249.093458024], took 0.31min
2025-12-04T10:03:37.5046440Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.xml
2025-12-04T10:03:37.5881575Z Running distributed/tensor/test_placement_types 1/1 ... [2025-12-04 10:03:37.587571][4249.195488742]
2025-12-04T10:03:37.5882211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:37.5883678Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_placement_types.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:37.587914]
2025-12-04T10:03:41.3619885Z 
2025-12-04T10:03:41.3621298Z distributed/tensor/test_placement_types 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_placement_types_1.1_c7b4602e70c3b07a_.log
2025-12-04T10:03:41.3625091Z Running 5 items in this shard: test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_dynamo_can_identify_placement_classes, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_equality, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_isinstance_shard, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_kwonly_argument, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_type_identification
2025-12-04T10:03:41.3628149Z 
2025-12-04T10:03:41.3628559Z Finished distributed/tensor/test_placement_types 1/1 ... [2025-12-04 10:03:41.361730][4252.96964606], took 0.06min
2025-12-04T10:03:41.3808241Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.xml
2025-12-04T10:03:41.4139464Z Running distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 10:03:41.413418][4253.021336835]
2025-12-04T10:03:41.4140204Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:41.4141595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor_dispatch_overhead.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:41.413792]
2025-12-04T10:03:51.1034727Z 
2025-12-04T10:03:51.1036012Z distributed/tensor/test_dtensor_dispatch_overhead 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_dispatch_overhead_1.1_85c49e7d8275b78b_.log
2025-12-04T10:03:51.1037844Z Running 1 items in this shard: test/distributed/tensor/test_dtensor_dispatch_overhead.py::DistOpDispatchOverHead::test_dtensor_add_op_dispatch_overhead
2025-12-04T10:03:51.1038640Z 
2025-12-04T10:03:51.1039128Z Finished distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 10:03:51.102835][4262.710751323], took 0.16min
2025-12-04T10:03:51.1220066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.xml
2025-12-04T10:03:51.1934669Z Running distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 10:03:51.192867][4262.800784213]
2025-12-04T10:03:51.1935428Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:51.1937352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpoint_reader.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:51.193211]
2025-12-04T10:03:55.4687268Z 
2025-12-04T10:03:55.4688735Z distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpoint_reader_1.1_68c37a9fa1601552_.log
2025-12-04T10:03:55.4694729Z Running 7 items in this shard: test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_different_dtypes, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_missing_keys, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_nonexistent_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_kwargs, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_map_location
2025-12-04T10:03:55.4699645Z 
2025-12-04T10:03:55.4700217Z Finished distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 10:03:55.468045][4267.075962106], took 0.07min
2025-12-04T10:03:55.4879110Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.xml
2025-12-04T10:03:55.5174872Z Running distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 10:03:55.516825][4267.124743111]
2025-12-04T10:03:55.5175548Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:03:55.5177197Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_format_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:55.517165]
2025-12-04T10:04:15.5861057Z 
2025-12-04T10:04:15.5862641Z distributed/checkpoint/test_format_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_format_utils_1.1_04ae55b8cdf477fd_.log
2025-12-04T10:04:15.5868334Z Running 3 items in this shard: test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_dcp_to_torch_save, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_online_torch_save_to_dcp, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_torch_save_to_dcp
2025-12-04T10:04:15.5869910Z 
2025-12-04T10:04:15.5870354Z Finished distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 10:04:15.585956][4287.193872573], took 0.33min
2025-12-04T10:04:15.6059440Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.xml
2025-12-04T10:04:15.6842526Z Running distributed/test_aten_comm_compute_reordering 1/2 ... [2025-12-04 10:04:15.683661][4287.291578499]
2025-12-04T10:04:15.6843200Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:04:15.6844525Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_aten_comm_compute_reordering.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:04:15.684014]
2025-12-04T10:10:07.8099086Z 
2025-12-04T10:10:07.8100556Z distributed/test_aten_comm_compute_reordering 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_aten_comm_compute_reordering_1.2_69f8c7d62333ccaf_.log
2025-12-04T10:10:07.8117337Z Running 25 items in this shard: test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_basic_all_reduce_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucket_exposed_with_hidden_single_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_deps_inductor, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_no_deps, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_wait_sink, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_inductor_default_comms_ordering, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_multidtype_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_no_bucketing_when_collective_depends_on_hiding_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_no_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_single_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_make_graph_view_and_get_subgraph_by_path_custom_module_stack_fn, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_manual_reordering_bucketing_pass_separate_buckets, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits_raise_comms
2025-12-04T10:10:07.8133896Z 
2025-12-04T10:10:07.8134340Z Finished distributed/test_aten_comm_compute_reordering 1/2 ... [2025-12-04 10:10:07.809777][4639.417690402], took 5.87min
2025-12-04T10:10:07.8299462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.xml
2025-12-04T10:10:07.9403071Z Running distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:10:07.939771][4639.547688652]
2025-12-04T10:10:07.9403735Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:10:07.9405041Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_redistribute.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:10:07.940119]
2025-12-04T10:11:46.8467836Z 
2025-12-04T10:11:46.8469265Z distributed/tensor/test_redistribute 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_redistribute_2.2_51e2d05d075503bf_.log
2025-12-04T10:11:46.8489696Z Running 33 items in this shard: test/distributed/tensor/test_redistribute.py::RedistributeTest::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_shard_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTest::test_redistribute_shard_dim_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_distribute_all_combination, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_redistribute_with_partial, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_shard_order_same_data_as_strided_shard, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_negative_shard_dim, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_uneven_sharding, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTestWithLocalTensor::test_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_for_special_placement, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_with_partial
2025-12-04T10:11:46.8509863Z 
2025-12-04T10:11:46.8510276Z Finished distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:11:46.846499][4738.454415028], took 1.65min
2025-12-04T10:11:46.8665850Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.xml
2025-12-04T10:11:46.9723185Z Running distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:11:46.971809][4738.579726845]
2025-12-04T10:11:46.9723873Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:11:46.9725212Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:11:46.972182]
2025-12-04T10:12:54.1565587Z 
2025-12-04T10:12:54.1566801Z distributed/tensor/parallel/test_tp_style 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_54e71dcd4ed048eb_.log
2025-12-04T10:12:54.1579302Z Running 18 items in this shard: test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_sequence_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_sequence_parallel_style
2025-12-04T10:12:54.1590930Z 
2025-12-04T10:12:54.1591375Z Finished distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:12:54.156163][4805.764079227], took 1.12min
2025-12-04T10:12:54.1766153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.xml
2025-12-04T10:12:54.2650594Z Running distributed/tensor/test_api 1/1 ... [2025-12-04 10:12:54.264502][4805.87241949]
2025-12-04T10:12:54.2651205Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:12:54.2652636Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:12:54.264849]
2025-12-04T10:13:57.3381745Z 
2025-12-04T10:13:57.3384490Z distributed/tensor/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_api_1.1_f4574b86db79cb55_.log
2025-12-04T10:13:57.3394031Z Running 18 items in this shard: test/distributed/tensor/test_api.py::DTensorAPITest::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_uneven_sharding, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_uneven_sharding
2025-12-04T10:13:57.3403052Z 
2025-12-04T10:13:57.3403399Z Finished distributed/tensor/test_api 1/1 ... [2025-12-04 10:13:57.337881][4868.945797553], took 1.05min
2025-12-04T10:13:57.3581993Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.xml
2025-12-04T10:13:57.4334180Z Running distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 10:13:57.432934][4869.040851408]
2025-12-04T10:13:57.4334852Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:13:57.4336131Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsspec.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:13:57.433293]
2025-12-04T10:14:11.8347663Z 
2025-12-04T10:14:11.8348849Z distributed/checkpoint/test_fsspec 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsspec_1.1_8eaa241efddb416a_.log
2025-12-04T10:14:11.8350956Z Running 3 items in this shard: test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_fsspec, test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_overwrite, test/distributed/checkpoint/test_fsspec.py::TestFileSystem::test_remove_on_fail
2025-12-04T10:14:11.8352249Z 
2025-12-04T10:14:11.8352670Z Finished distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 10:14:11.834429][4883.442339588], took 0.24min
2025-12-04T10:14:11.8545660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.xml
2025-12-04T10:14:11.9313006Z Running distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 10:14:11.930744][4883.5386612]
2025-12-04T10:14:11.9314023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:14:11.9315400Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/experimental/test_tp_transform.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:11.931093]
2025-12-04T10:14:37.3115415Z 
2025-12-04T10:14:37.3116784Z distributed/tensor/experimental/test_tp_transform 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.experimental.test_tp_transform_1.1_d11081dcea691eaf_.log
2025-12-04T10:14:37.3119632Z Running 3 items in this shard: test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_e2e, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_no_bias, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_with_uncovered_op
2025-12-04T10:14:37.3122533Z 
2025-12-04T10:14:37.3123066Z Finished distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 10:14:37.311071][4908.918987722], took 0.42min
2025-12-04T10:14:37.3311460Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.xml
2025-12-04T10:14:37.4151398Z Running distributed/checkpoint/test_traverse 1/1 ... [2025-12-04 10:14:37.414586][4909.022503412]
2025-12-04T10:14:37.4152047Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:14:37.4153347Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_traverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:37.414927]
2025-12-04T10:14:41.2890809Z 
2025-12-04T10:14:41.2892079Z distributed/checkpoint/test_traverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_traverse_1.1_eea2c84c34471245_.log
2025-12-04T10:14:41.2896505Z Running 7 items in this shard: test/distributed/checkpoint/test_traverse.py::TestTraverse::test_get_element, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_set_element, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_doesnt_ignore_intermediate_collections, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_nested_dict, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_nested_list, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_shallow, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_with_ordered_dict
2025-12-04T10:14:41.2900072Z 
2025-12-04T10:14:41.2900515Z Finished distributed/checkpoint/test_traverse 1/1 ... [2025-12-04 10:14:41.288628][4912.896544739], took 0.06min
2025-12-04T10:14:41.3089148Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.xml
2025-12-04T10:14:41.3516279Z Running distributed/tensor/test_random_ops 1/1 ... [2025-12-04 10:14:41.351172][4912.959089762]
2025-12-04T10:14:41.3516919Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:14:41.3518201Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_random_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:41.351536]
2025-12-04T10:16:09.3895045Z 
2025-12-04T10:16:09.3896239Z distributed/tensor/test_random_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_random_ops_1.1_b2ded413b82ba64f_.log
2025-12-04T10:16:09.3913149Z Running 28 items in this shard: test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_fsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_init_ops, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_init_with_user_generator, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_meta_tensor_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_dropout_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_rand_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_uniform_2d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_manual_seed_submesh, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_philox_state_seed_roundtrip, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_pipeline_parallel_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_rng_tracker_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpsTest3D::test_hsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_fsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_init_ops, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_init_with_user_generator, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_meta_tensor_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_dropout_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_rand_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_uniform_2d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_manual_seed_submesh, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_philox_state_seed_roundtrip, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_pipeline_parallel_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_rng_tracker_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpsTest3DWithLocalTensor::test_hsdp_tp_model_meta_init
2025-12-04T10:16:09.3928613Z 
2025-12-04T10:16:09.3929037Z Finished distributed/tensor/test_random_ops 1/1 ... [2025-12-04 10:16:09.388980][5000.99687848], took 1.47min
2025-12-04T10:16:09.4099868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.xml
2025-12-04T10:16:09.5123883Z Running distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 10:16:09.512113][5001.120030309]
2025-12-04T10:16:09.5124628Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:09.5126750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:09.512458]
2025-12-04T10:16:12.8782922Z 
2025-12-04T10:16:12.8784259Z distributed/_composable/fsdp/test_fully_shard_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_logging_1.1_334cd8181d21220c_.log
2025-12-04T10:16:12.8785877Z Running 0 items in this shard:
2025-12-04T10:16:12.8786101Z 
2025-12-04T10:16:12.8786605Z Finished distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 10:16:12.878082][5004.486000544], took 0.06min
2025-12-04T10:16:12.8985796Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.xml
2025-12-04T10:16:12.9346530Z Running distributed/launcher/test_api 1/1 ... [2025-12-04 10:16:12.934176][5004.542092924]
2025-12-04T10:16:12.9347146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:16.7586310Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/launcher/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:12.934530]
2025-12-04T10:16:16.7587673Z 
2025-12-04T10:16:16.7588636Z distributed/launcher/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.launcher.test_api_1.1_4a83e51b1f3b8245_.log
2025-12-04T10:16:16.7590348Z Running 2 items in this shard: test/distributed/launcher/test_api.py::LauncherApiTest::test_launch_agent_default_signals, test/distributed/launcher/test_api.py::LauncherApiTest::test_launch_agent_sets_signals_env_var
2025-12-04T10:16:16.7591354Z 
2025-12-04T10:16:16.7591714Z Finished distributed/launcher/test_api 1/1 ... [2025-12-04 10:16:16.757986][5008.365902291], took 0.06min
2025-12-04T10:16:16.7785342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.xml
2025-12-04T10:16:16.8123540Z Running distributed/elastic/multiprocessing/test_api 1/1 ... [2025-12-04 10:16:16.811899][5008.419816702]
2025-12-04T10:16:16.8124276Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:16.8125898Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/multiprocessing/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:16.812347]
2025-12-04T10:16:20.6365337Z 
2025-12-04T10:16:20.6366783Z distributed/elastic/multiprocessing/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.multiprocessing.test_api_1.1_4bf04d2a67164589_.log
2025-12-04T10:16:20.6372065Z Running 7 items in this shard: test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_handles_invalid_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_handles_windows_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_not_main_thread, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_registers_custom_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_registers_default_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_supports_sigusr1_and_sigusr2, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_terminate_process_handler
2025-12-04T10:16:20.6375856Z 
2025-12-04T10:16:20.6376415Z Finished distributed/elastic/multiprocessing/test_api 1/1 ... [2025-12-04 10:16:20.636019][5012.243935583], took 0.06min
2025-12-04T10:16:20.6569761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.xml
2025-12-04T10:16:20.6915771Z Running distributed/fsdp/test_shard_utils 1/1 ... [2025-12-04 10:16:20.691111][5012.299029119]
2025-12-04T10:16:20.6916377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:20.6917964Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_shard_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:20.691461]
2025-12-04T10:16:34.4918633Z 
2025-12-04T10:16:34.4920022Z distributed/fsdp/test_shard_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_shard_utils_1.1_4e12f3568c69a797_.log
2025-12-04T10:16:34.4922983Z Running 2 items in this shard: test/distributed/fsdp/test_shard_utils.py::TestShardUtilsDistributed::test_create_chunk_sharded_tensor, test/distributed/fsdp/test_shard_utils.py::TestShardUtilsDistributedDTensor::test_create_chunk_dtensor
2025-12-04T10:16:34.4924519Z 
2025-12-04T10:16:34.4924923Z Finished distributed/fsdp/test_shard_utils 1/1 ... [2025-12-04 10:16:34.491244][5026.099159773], took 0.23min
2025-12-04T10:16:34.5123468Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.xml
2025-12-04T10:16:34.6077525Z Running distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 10:16:34.607132][5026.21504919]
2025-12-04T10:16:34.6078222Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:34.6079560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_optim_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:34.607481]
2025-12-04T10:16:50.7635053Z 
2025-12-04T10:16:50.7636723Z distributed/checkpoint/test_fsdp_optim_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_optim_state_1.1_d25d2159eaa83e63_.log
2025-12-04T10:16:50.7639298Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_False, test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_True
2025-12-04T10:16:50.7640889Z 
2025-12-04T10:16:50.7641621Z Finished distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 10:16:50.763481][5042.371396647], took 0.27min
2025-12-04T10:16:50.7854478Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.xml
2025-12-04T10:16:50.8708529Z Running distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 10:16:50.870301][5042.478219153]
2025-12-04T10:16:50.8709235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:16:50.8710592Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/e2e/test_e2e_save_and_load.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:50.870651]
2025-12-04T10:19:33.5116383Z 
2025-12-04T10:19:33.5118293Z distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_e2e_save_and_load_1.1_4cbd59f9e8ee7ec0_.log
2025-12-04T10:19:33.5145086Z Running 19 items in this shard: test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_different_ordered_state_dict_keys, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type0_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type2_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type4_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type5_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type1_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type3_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_no_dist, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_overwrite, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_partial_load, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_stateful_and_non_stateful_loads, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestNoCPU::test_no_cpu, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestInitStateDict::test_init_state_dict
2025-12-04T10:19:33.5167339Z 
2025-12-04T10:19:33.5168228Z Finished distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 10:19:33.511642][5205.119552255], took 2.71min
2025-12-04T10:19:33.5359776Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.xml
2025-12-04T10:19:34.0579674Z Uploading artifacts took 0.42 seconds
2025-12-04T10:19:34.0588313Z Running distributed/checkpoint/test_dtensor_resharding 1/1 ... [2025-12-04 10:19:34.058225][5205.666141033]
2025-12-04T10:19:34.0589227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:19:34.0590586Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_dtensor_resharding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:19:34.058571]
2025-12-04T10:20:56.1418258Z 
2025-12-04T10:20:56.1419574Z distributed/checkpoint/test_dtensor_resharding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_dtensor_resharding_1.1_a0990bee4dfbe749_.log
2025-12-04T10:20:56.1428424Z Running 10 items in this shard: test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions0, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions1, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions2, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_2d_to_2d_reshard_placement_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_1d_to_2d_reshard_mesh_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_2d_to_1d_reshard_mesh_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_dtensor_checkpoint_resharding_with_empty_shard, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_dtensor_checkpoint_with_uneven_shards, test/distributed/checkpoint/test_dtensor_resharding.py::TestCheckpointableReshard::test_uneven_reshard_with_checkpointable_api, test/distributed/checkpoint/test_dtensor_resharding.py::TestCheckpointableReshard::test_uneven_reshard_with_dtensor_shards_wrapper_api
2025-12-04T10:20:56.1435760Z 
2025-12-04T10:20:56.1436238Z Finished distributed/checkpoint/test_dtensor_resharding 1/1 ... [2025-12-04 10:20:56.141337][5287.74925251], took 1.37min
2025-12-04T10:20:56.1635716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.xml
2025-12-04T10:20:56.2484963Z Running distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 10:20:56.247964][5287.855881783]
2025-12-04T10:20:56.2485617Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:20:56.2486891Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:20:56.248315]
2025-12-04T10:21:12.4986325Z 
2025-12-04T10:21:12.4987512Z distributed/fsdp/test_fsdp_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_memory_1.1_ac8e61e17ebeaaa5_.log
2025-12-04T10:21:12.4989347Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_ckpt, test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_no_ckpt
2025-12-04T10:21:12.4990403Z 
2025-12-04T10:21:12.4990798Z Finished distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 10:21:12.498137][5304.10604602], took 0.27min
2025-12-04T10:21:12.5204001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.xml
2025-12-04T10:21:12.6070593Z Running distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 10:21:12.606486][5304.214402354]
2025-12-04T10:21:12.6071265Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:12.6072564Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_pointwise_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:12.606849]
2025-12-04T10:21:19.6371365Z 
2025-12-04T10:21:19.6372421Z distributed/tensor/test_pointwise_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_pointwise_ops_1.1_fc7ea695ae4d24dd_.log
2025-12-04T10:21:19.6383460Z Running 18 items in this shard: test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add
2025-12-04T10:21:19.6392423Z 
2025-12-04T10:21:19.6392807Z Finished distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 10:21:19.636595][5311.244511251], took 0.12min
2025-12-04T10:21:19.6580950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.xml
2025-12-04T10:21:19.7639243Z Running distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 10:21:19.763318][5311.37123544]
2025-12-04T10:21:19.7639955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:19.7641283Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:19.763667]
2025-12-04T10:21:24.0379517Z 
2025-12-04T10:21:24.0380746Z distributed/checkpoint/test_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_compatibility_1.1_995845a47bb8bc7e_.log
2025-12-04T10:21:24.0384233Z Running 4 items in this shard: test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_metadata, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_sharded_tensor_dependency, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_storage_meta, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_with_v_2_3
2025-12-04T10:21:24.0386485Z 
2025-12-04T10:21:24.0386917Z Finished distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 10:21:24.037620][5315.645535315], took 0.07min
2025-12-04T10:21:24.0588393Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.xml
2025-12-04T10:21:24.1059016Z Running distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 10:21:24.105423][5315.713339979]
2025-12-04T10:21:24.1059653Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:24.1060981Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_mem_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:24.105804]
2025-12-04T10:21:28.6318988Z 
2025-12-04T10:21:28.6320302Z distributed/_tools/test_mem_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_mem_tracker_1.1_c5962f3ebcf85955_.log
2025-12-04T10:21:28.6323208Z Running 3 items in this shard: test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_accelerator_tracker_equivalence, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_attribution, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_with_activation_checkpointing
2025-12-04T10:21:28.6324931Z 
2025-12-04T10:21:28.6325331Z Finished distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 10:21:28.631344][5320.239259083], took 0.08min
2025-12-04T10:21:28.6528648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.xml
2025-12-04T10:21:28.6878105Z Running distributed/elastic/test_control_plane 1/1 ... [2025-12-04 10:21:28.687172][5320.295089322]
2025-12-04T10:21:28.6879274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:28.6880643Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/test_control_plane.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:28.687531]
2025-12-04T10:21:32.5619824Z 
2025-12-04T10:21:32.5621512Z distributed/elastic/test_control_plane 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.test_control_plane_1.1_74d942263f51456c_.log
2025-12-04T10:21:32.5627206Z Running 10 items in this shard: test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_json, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_params, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_traceback, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_names, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_nonexistant, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_run_handler, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_tcp, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_wait_counter_values, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_worker_server
2025-12-04T10:21:32.5632021Z 
2025-12-04T10:21:32.5632557Z Finished distributed/elastic/test_control_plane 1/1 ... [2025-12-04 10:21:32.561704][5324.169620373], took 0.06min
2025-12-04T10:21:32.5839412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.xml
2025-12-04T10:21:32.6175776Z Running distributed/test_fake_pg 1/1 ... [2025-12-04 10:21:32.616975][5324.224893186]
2025-12-04T10:21:32.6176498Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:32.6177894Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_fake_pg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:32.617332]
2025-12-04T10:21:37.1938111Z 
2025-12-04T10:21:37.1939351Z distributed/test_fake_pg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_fake_pg_1.1_ecf9a296b2457f78_.log
2025-12-04T10:21:37.1945550Z Running 16 items in this shard: test/distributed/test_fake_pg.py::TestFakePG::test_all_reduce, test/distributed/test_fake_pg.py::TestFakePG::test_allgather, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall_base, test/distributed/test_fake_pg.py::TestFakePG::test_broadcast, test/distributed/test_fake_pg.py::TestFakePG::test_construct_fsdp, test/distributed/test_fake_pg.py::TestFakePG::test_error_on_collective, test/distributed/test_fake_pg.py::TestFakePG::test_fake_pg_tracing, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_direct_usage_error, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_proper_usage_dispatch, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_tp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_recv, test/distributed/test_fake_pg.py::TestFakePG::test_reduce_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_send
2025-12-04T10:21:37.1951069Z 
2025-12-04T10:21:37.1951413Z Finished distributed/test_fake_pg 1/1 ... [2025-12-04 10:21:37.193337][5328.801247965], took 0.08min
2025-12-04T10:21:37.2157401Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.xml
2025-12-04T10:21:37.2508853Z Running distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 10:21:37.250381][5328.858298217]
2025-12-04T10:21:37.2509560Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:37.2510887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_model_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:37.250740]
2025-12-04T10:21:53.9090390Z 
2025-12-04T10:21:53.9092016Z distributed/checkpoint/test_fsdp_model_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_model_state_1.1_0d5362771b48c12a_.log
2025-12-04T10:21:53.9094492Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_no_resharding, test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_with_resharding
2025-12-04T10:21:53.9095853Z 
2025-12-04T10:21:53.9096392Z Finished distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 10:21:53.908651][5345.516567096], took 0.28min
2025-12-04T10:21:53.9315346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.xml
2025-12-04T10:21:54.0360583Z Running distributed/test_functional_api 1/1 ... [2025-12-04 10:21:54.035427][5345.643345113]
2025-12-04T10:21:54.0361272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:21:54.0362704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_functional_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:54.035774]
2025-12-04T10:24:43.8680110Z 
2025-12-04T10:24:43.8681251Z distributed/test_functional_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_functional_api_1.1_d60bb00edf6e8a81_.log
2025-12-04T10:24:43.8689743Z Running 11 items in this shard: test/distributed/test_functional_api.py::TestMetaCollectives::test_all_reduce, test/distributed/test_functional_api.py::TestMakeFx::test_all_reduce_tracing, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_gather_into_tensor_coalesced_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_1d_input_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_split_sizes_none_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_dce_code_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_fakepg_cuda, test/distributed/test_functional_api.py::TestDistributedBackendCollectivesWithWorldSize4CUDA::test_permute_tensor_with_sub_group_cuda, test/distributed/test_functional_api.py::TestFunctionalAutogradWithDistributedBackendCUDA::test_all_to_all_single_cuda
2025-12-04T10:24:43.8696245Z 
2025-12-04T10:24:43.8696927Z Finished distributed/test_functional_api 1/1 ... [2025-12-04 10:24:43.867521][5515.475436861], took 2.83min
2025-12-04T10:24:43.8903330Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.xml
2025-12-04T10:24:43.9723917Z Running distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 10:24:43.972132][5515.580033594]
2025-12-04T10:24:43.9724676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:24:43.9726741Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:24:43.972475]
2025-12-04T10:25:04.3881012Z 
2025-12-04T10:25:04.3884461Z distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_clip_grad_norm__1.1_76ba1390d272d622_.log
2025-12-04T10:25:04.3886929Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize2::test_clip_grad_norm_1d, test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize4::test_clip_grad_norm_2d
2025-12-04T10:25:04.3888280Z 
2025-12-04T10:25:04.3888793Z Finished distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 10:25:04.387528][5535.995444593], took 0.34min
2025-12-04T10:25:04.4099845Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.xml
2025-12-04T10:25:04.4915990Z Running distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 10:25:04.491058][5536.098976103]
2025-12-04T10:25:04.4916635Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:25:04.4918177Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/debug/test_comm_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:04.491400]
2025-12-04T10:25:08.7168201Z 
2025-12-04T10:25:08.7169435Z distributed/tensor/debug/test_comm_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_comm_mode_1.1_40ca723c6c817b86_.log
2025-12-04T10:25:08.7172173Z Running 4 items in this shard: test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_coalesced, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_c10d, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_dtensor
2025-12-04T10:25:08.7174266Z 
2025-12-04T10:25:08.7174685Z Finished distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 10:25:08.716329][5540.324245649], took 0.07min
2025-12-04T10:25:08.7389827Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.xml
2025-12-04T10:25:08.7721724Z Running distributed/test_dist2 1/1 ... [2025-12-04 10:25:08.771545][5540.379462499]
2025-12-04T10:25:08.7722326Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:25:08.7723574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_dist2.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:08.771889]
2025-12-04T10:27:33.9537647Z 
2025-12-04T10:27:33.9538623Z distributed/test_dist2 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_dist2_1.1_cc2e2f70acaf1086_.log
2025-12-04T10:27:33.9552364Z Running 34 items in this shard: test/distributed/test_dist2.py::ProcessGroupTest::test_context_manager, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allgather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allreduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_alltoall_base, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_barrier, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_broadcast, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_gather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_group_split, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce_scatter, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_remote_group_merge, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_gather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_gather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_scatter
2025-12-04T10:27:33.9564686Z 
2025-12-04T10:27:33.9565020Z Finished distributed/test_dist2 1/1 ... [2025-12-04 10:27:33.953371][5685.561285542], took 2.42min
2025-12-04T10:27:33.9769126Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.xml
2025-12-04T10:27:34.0618582Z Running distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 10:27:34.061277][5685.669194528]
2025-12-04T10:27:34.0619351Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:27:34.0620998Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_grad_scaler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:34.061615]
2025-12-04T10:27:45.3552115Z 
2025-12-04T10:27:45.3553460Z distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_grad_scaler_1.1_5aa2313403ba4568_.log
2025-12-04T10:27:45.3555315Z Running 1 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_grad_scaler.py::TestFullyShardGradientScaler::test_gradient_scaler
2025-12-04T10:27:45.3556087Z 
2025-12-04T10:27:45.3556600Z Finished distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 10:27:45.354755][5696.962670424], took 0.19min
2025-12-04T10:27:45.3781934Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.xml
2025-12-04T10:27:45.4680847Z Running distributed/launcher/test_run 1/1 ... [2025-12-04 10:27:45.467519][5697.075435831]
2025-12-04T10:27:45.4681451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:27:45.4682693Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/launcher/test_run.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:45.467875]
2025-12-04T10:28:52.9190610Z 
2025-12-04T10:28:52.9194152Z distributed/launcher/test_run 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.launcher.test_run_1.1_b22d13de769d84ff_.log
2025-12-04T10:28:52.9219050Z Running 26 items in this shard: test/distributed/launcher/test_run.py::ElasticLaunchTest::test_capture_logs_using_default_logs_specs, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_env_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_tcp_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_not_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched_with_logs_spec_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_agent_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_multiple_agents, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_worker_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_run_path, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_shutdown, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_standalone, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_bash, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_default_nproc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python_caffe2_bc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_with_env_vars, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_logs_logs_spec_entrypoint_must_be_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_min_max_nodes_parse, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_gpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_auto_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_number_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_unknown_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_xpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_virtual_local_rank
2025-12-04T10:28:52.9241876Z 
2025-12-04T10:28:52.9242539Z Finished distributed/launcher/test_run 1/1 ... [2025-12-04 10:28:52.918996][5764.526912136], took 1.12min
2025-12-04T10:28:52.9444041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.xml
2025-12-04T10:28:53.0300410Z Running distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 10:28:53.029786][5764.637703657]
2025-12-04T10:28:53.0301534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:28:53.0305209Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_backward_prefetch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:28:53.030191]
2025-12-04T10:29:03.2209710Z 
2025-12-04T10:29:03.2211028Z distributed/fsdp/test_fsdp_backward_prefetch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_backward_prefetch_1.1_29df4062c54c1e1a_.log
2025-12-04T10:29:03.2212629Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_backward_prefetch.py::TestBackwardPrefetch::test_backward_prefetch
2025-12-04T10:29:03.2213654Z 
2025-12-04T10:29:03.2214105Z Finished distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 10:29:03.220456][5774.828371753], took 0.17min
2025-12-04T10:29:03.2440392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.xml
2025-12-04T10:29:03.3339759Z Running distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 10:29:03.333736][5774.941652027]
2025-12-04T10:29:03.3340451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:29:03.3343208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:03.334118]
2025-12-04T10:29:49.7681378Z 
2025-12-04T10:29:49.7682539Z distributed/checkpoint/test_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_checkpoint_1.1_d7eb3fb6652ade87_.log
2025-12-04T10:29:49.7688042Z Running 8 items in this shard: test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_default_metadata, test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_tensor_metadata_with_missing_rank_spec, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_reader_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_writer_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling_no_dist, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling_no_dist
2025-12-04T10:29:49.7692471Z 
2025-12-04T10:29:49.7692900Z Finished distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 10:29:49.767816][5821.375732101], took 0.77min
2025-12-04T10:29:49.7919503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.xml
2025-12-04T10:29:49.8779184Z Running distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 10:29:49.877645][5821.485550262]
2025-12-04T10:29:49.8779822Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:29:49.8781872Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_coalesce.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:49.877999]
2025-12-04T10:29:53.6522975Z 
2025-12-04T10:29:53.6524133Z distributed/_pycute/test_coalesce 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_coalesce_1.1_b9854b582e22535e_.log
2025-12-04T10:29:53.6525539Z Running 1 items in this shard: test/distributed/_pycute/test_coalesce.py::TestCoalesce::test_coalesce
2025-12-04T10:29:53.6528000Z 
2025-12-04T10:29:53.6528690Z Finished distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 10:29:53.651773][5825.259685699], took 0.06min
2025-12-04T10:29:53.6757234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.xml
2025-12-04T10:29:53.7116189Z Running distributed/_pycute/test_complement 1/1 ... [2025-12-04 10:29:53.711080][5825.318996735]
2025-12-04T10:29:53.7116831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:29:53.7118114Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_complement.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:53.711428]
2025-12-04T10:29:57.4854645Z 
2025-12-04T10:29:57.4855779Z distributed/_pycute/test_complement 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_complement_1.1_ccd05958479ced51_.log
2025-12-04T10:29:57.4857596Z Running 1 items in this shard: test/distributed/_pycute/test_complement.py::TestComplement::test_complement
2025-12-04T10:29:57.4858181Z 
2025-12-04T10:29:57.4858599Z Finished distributed/_pycute/test_complement 1/1 ... [2025-12-04 10:29:57.484890][5829.092805789], took 0.06min
2025-12-04T10:29:57.5086790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.xml
2025-12-04T10:29:57.5413356Z Running distributed/_pycute/test_composition 1/1 ... [2025-12-04 10:29:57.540735][5829.148652684]
2025-12-04T10:29:57.5414026Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:29:57.5415307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_composition.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:57.541083]
2025-12-04T10:30:01.3151532Z 
2025-12-04T10:30:01.3152898Z distributed/_pycute/test_composition 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_composition_1.1_6a9f660c56ddbb95_.log
2025-12-04T10:30:01.3154406Z Running 1 items in this shard: test/distributed/_pycute/test_composition.py::TestComposition::test_composition
2025-12-04T10:30:01.3154982Z 
2025-12-04T10:30:01.3155404Z Finished distributed/_pycute/test_composition 1/1 ... [2025-12-04 10:30:01.314824][5832.922740995], took 0.06min
2025-12-04T10:30:01.3385068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.xml
2025-12-04T10:30:01.3734218Z Running distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 10:30:01.372838][5832.980754715]
2025-12-04T10:30:01.3734864Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:30:01.3736155Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_int_tuple.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:01.373189]
2025-12-04T10:30:05.1977813Z 
2025-12-04T10:30:05.1978914Z distributed/_pycute/test_int_tuple 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_int_tuple_1.1_1b6829b59a3a12af_.log
2025-12-04T10:30:05.1984790Z Running 12 items in this shard: test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_idx2crd_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_int_with_tuple_shape, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_none, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_crd2idx_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_inner_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_shape_div, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_suffix_product
2025-12-04T10:30:05.1989662Z 
2025-12-04T10:30:05.1990148Z Finished distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 10:30:05.197211][5836.805125867], took 0.06min
2025-12-04T10:30:05.2213714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.xml
2025-12-04T10:30:05.2517261Z Running distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 10:30:05.251112][5836.859028412]
2025-12-04T10:30:05.2517960Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:30:05.2519245Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_left_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:05.251510]
2025-12-04T10:30:09.0253826Z 
2025-12-04T10:30:09.0254902Z distributed/_pycute/test_left_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_left_inverse_1.1_e810fe2e4745b377_.log
2025-12-04T10:30:09.0256408Z Running 1 items in this shard: test/distributed/_pycute/test_left_inverse.py::TestLeftInverse::test_left_inverse
2025-12-04T10:30:09.0257201Z 
2025-12-04T10:30:09.0257632Z Finished distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 10:30:09.024928][5840.632844521], took 0.06min
2025-12-04T10:30:09.0491724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.xml
2025-12-04T10:30:09.0847088Z Running distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 10:30:09.084059][5840.691976263]
2025-12-04T10:30:09.0847762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:30:09.0849044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_right_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:09.084411]
2025-12-04T10:30:12.8581541Z 
2025-12-04T10:30:12.8582758Z distributed/_pycute/test_right_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_right_inverse_1.1_c9aa035dc9548e77_.log
2025-12-04T10:30:12.8584245Z Running 1 items in this shard: test/distributed/_pycute/test_right_inverse.py::TestRightInverse::test_right_inverse
2025-12-04T10:30:12.8584847Z 
2025-12-04T10:30:12.8585272Z Finished distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 10:30:12.857837][5844.465753575], took 0.06min
2025-12-04T10:30:12.8822360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.xml
2025-12-04T10:30:12.9176028Z Running distributed/_composable/test_replicate 1/1 ... [2025-12-04 10:30:12.917065][5844.524982886]
2025-12-04T10:30:12.9176827Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:30:12.9178339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_replicate.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:12.917423]
2025-12-04T10:31:27.6172887Z 
2025-12-04T10:31:27.6174465Z distributed/_composable/test_replicate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_1.1_ede2d02b7e8a4250_.log
2025-12-04T10:31:27.6184701Z Running 17 items in this shard: test/distributed/_composable/test_replicate.py::ReplicateStateDictTest::test_replicate_non_root_multiple_save_load, test/distributed/_composable/test_replicate.py::ReplicateStateDictTest::test_replicate_single_module_save_load, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_device_id, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_ignore_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_move_args_kwargs_to_device, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_multi_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_single_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_with_kwargs, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_wrong_device_id_type, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_device_id, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_fully_shard_init, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_ignore_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_move_args_kwargs_to_device, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_multi_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_single_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_with_kwargs, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_wrong_device_id_type
2025-12-04T10:31:27.6193383Z 
2025-12-04T10:31:27.6193789Z Finished distributed/_composable/test_replicate 1/1 ... [2025-12-04 10:31:27.616680][5919.224595513], took 1.24min
2025-12-04T10:31:27.6416392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.xml
2025-12-04T10:31:27.7258143Z Running distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 10:31:27.725229][5919.333146113]
2025-12-04T10:31:27.7258849Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:31:27.7260222Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_hsdp_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:31:27.725574]
2025-12-04T10:31:59.1199137Z 
2025-12-04T10:31:59.1200463Z distributed/checkpoint/test_hsdp_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hsdp_checkpoint_1.1_38b6379e9fe79671_.log
2025-12-04T10:31:59.1204122Z Running 4 items in this shard: test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_True, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_True
2025-12-04T10:31:59.1206868Z 
2025-12-04T10:31:59.1207346Z Finished distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 10:31:59.119653][5950.727568389], took 0.52min
2025-12-04T10:31:59.1446345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.xml
2025-12-04T10:31:59.2272590Z Running distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 10:31:59.226655][5950.834573142]
2025-12-04T10:31:59.2273295Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:31:59.2274655Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_parallelize_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:31:59.227000]
2025-12-04T10:33:46.7644824Z 
2025-12-04T10:33:46.7646268Z distributed/tensor/parallel/test_parallelize_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_parallelize_api_1.1_a79c3b02a80366e9_.log
2025-12-04T10:33:46.7671597Z Running 32 items in this shard: test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_under_devicemesh_context, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_under_devicemesh_context
2025-12-04T10:33:46.7692985Z 
2025-12-04T10:33:46.7693495Z Finished distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 10:33:46.763960][6058.371876312], took 1.79min
2025-12-04T10:33:46.7897408Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.xml
2025-12-04T10:33:46.8978387Z Running distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 10:33:46.897170][6058.505087901]
2025-12-04T10:33:46.8979034Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:33:46.8980351Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_state_dict.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:33:46.897509]
2025-12-04T10:40:21.2626989Z 
2025-12-04T10:40:21.2628176Z distributed/fsdp/test_fsdp_state_dict 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_state_dict_1.2_f864b6fe160d675b_.log
2025-12-04T10:40:21.2702426Z Running 78 items in this shard: test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_local_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_full_state_dict_missing_unexpected_keys_cleaned, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_rank0_offload_save_load_flow_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_save_load_flow_state_dict_type_sharded_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_type, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_shared_parameters_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_torch_save_load, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict4GPUs::test_local_state_dict_reshard
2025-12-04T10:40:21.2773120Z 
2025-12-04T10:40:21.2773532Z Finished distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 10:40:21.264033][6452.871945679], took 6.57min
2025-12-04T10:40:21.2899405Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.xml
2025-12-04T10:40:21.8294585Z Uploading artifacts took 0.46 seconds
2025-12-04T10:40:21.8295569Z Running distributed/_pycute/test_typing 1/1 ... [2025-12-04 10:40:21.829407][6453.437324358]
2025-12-04T10:40:21.8296190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:21.8300008Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_typing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:21.829761]
2025-12-04T10:40:25.6539434Z 
2025-12-04T10:40:25.6540568Z distributed/_pycute/test_typing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_typing_1.1_70d9a252095d6a68_.log
2025-12-04T10:40:25.6542184Z Running 1 items in this shard: test/distributed/_pycute/test_typing.py::TestTyping::test_typing
2025-12-04T10:40:25.6542788Z 
2025-12-04T10:40:25.6543171Z Finished distributed/_pycute/test_typing 1/1 ... [2025-12-04 10:40:25.653533][6457.261448532], took 0.06min
2025-12-04T10:40:25.6780599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.xml
2025-12-04T10:40:25.7141421Z Running distributed/test_distributed_spawn 1/9 ... [2025-12-04 10:40:25.713926][6457.321842852]
2025-12-04T10:40:25.7143805Z Running distributed tests for the test backend with env init_method
2025-12-04T10:40:25.7145738Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:25.7149473Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:25.714759]
2025-12-04T10:40:29.2914232Z 
2025-12-04T10:40:29.2915325Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_8732ec05eb19aa05_.log
2025-12-04T10:40:29.2916396Z Running 0 items in this shard:
2025-12-04T10:40:29.2916610Z 
2025-12-04T10:40:29.2919067Z Running distributed tests for the test backend with file init_method
2025-12-04T10:40:29.2921055Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:29.2925506Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:29.292326]
2025-12-04T10:40:32.8707319Z 
2025-12-04T10:40:32.8708479Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_28ca104a37c9a833_.log
2025-12-04T10:40:32.8709650Z Running 0 items in this shard:
2025-12-04T10:40:32.8709879Z 
2025-12-04T10:40:32.8714906Z Running distributed tests for the mpi backend with env init_method
2025-12-04T10:40:33.0015557Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:33.0018791Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:33.001512]
2025-12-04T10:40:37.1935200Z 
2025-12-04T10:40:37.1936330Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_4a0940f8014b8eef_.log
2025-12-04T10:40:37.1937772Z Running 0 items in this shard:
2025-12-04T10:40:37.1938107Z Running 0 items in this shard:
2025-12-04T10:40:37.1938452Z Running 0 items in this shard:
2025-12-04T10:40:37.1938667Z 
2025-12-04T10:40:37.1941745Z Running distributed tests for the mpi backend with file init_method
2025-12-04T10:40:37.3200301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:37.3201846Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:37.319871]
2025-12-04T10:40:41.5207221Z 
2025-12-04T10:40:41.5208395Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_dc17769dd5c2239f_.log
2025-12-04T10:40:41.5209443Z Running 0 items in this shard:
2025-12-04T10:40:41.5209783Z Running 0 items in this shard:
2025-12-04T10:40:41.5210333Z Running 0 items in this shard:
2025-12-04T10:40:41.5210644Z 
2025-12-04T10:40:41.5210921Z Running distributed tests for the nccl backend with env init_method
2025-12-04T10:40:41.5212686Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:40:41.5216532Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:41.521454]
2025-12-04T10:44:28.6748240Z 
2025-12-04T10:44:28.6749878Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_3cbdf0379e4c6767_.log
2025-12-04T10:44:28.6770739Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:44:28.6790836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager
2025-12-04T10:44:28.6792116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU
2025-12-04T10:44:28.6793467Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last
2025-12-04T10:44:28.6794836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync
2025-12-04T10:44:28.6796087Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda
2025-12-04T10:44:28.6797337Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T10:44:28.6798576Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max
2025-12-04T10:44:28.6799830Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T10:44:28.6801239Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported
2025-12-04T10:44:28.6802559Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T10:44:28.6803762Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min
2025-12-04T10:44:28.6804894Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T10:44:28.6806011Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async
2025-12-04T10:44:28.6807215Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T10:44:28.6808441Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split
2025-12-04T10:44:28.6809740Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex
2025-12-04T10:44:28.6811104Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group
2025-12-04T10:44:28.6812470Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group
2025-12-04T10:44:28.6813784Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T10:44:28.6815052Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping
2025-12-04T10:44:28.6816404Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph
2025-12-04T10:44:28.6817837Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T10:44:28.6819144Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params
2025-12-04T10:44:28.6820488Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T10:44:28.6822054Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
2025-12-04T10:44:28.6823222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T10:44:28.6824335Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T10:44:28.6825502Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup
2025-12-04T10:44:28.6826728Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups
2025-12-04T10:44:28.6828045Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T10:44:28.6829475Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T10:44:28.6830714Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T10:44:28.6831829Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl
2025-12-04T10:44:28.6833303Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu
2025-12-04T10:44:28.6834525Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:44:28.6835420Z 
2025-12-04T10:44:28.6835683Z Running distributed tests for the nccl backend with file init_method
2025-12-04T10:44:28.6836177Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:44:28.6837508Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:44:28.676613]
2025-12-04T10:48:15.5931548Z 
2025-12-04T10:48:15.5932698Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_25c7f8918b3d0b51_.log
2025-12-04T10:48:15.5955689Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:48:15.5974998Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager
2025-12-04T10:48:15.5976359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU
2025-12-04T10:48:15.5977936Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last
2025-12-04T10:48:15.5979339Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync
2025-12-04T10:48:15.5988275Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda
2025-12-04T10:48:15.5990302Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T10:48:15.5991640Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max
2025-12-04T10:48:15.5992904Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T10:48:15.5994220Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported
2025-12-04T10:48:15.5995609Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T10:48:15.5996798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min
2025-12-04T10:48:15.5997947Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T10:48:15.5999074Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async
2025-12-04T10:48:15.6000253Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T10:48:15.6001477Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split
2025-12-04T10:48:15.6002796Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex
2025-12-04T10:48:15.6004171Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group
2025-12-04T10:48:15.6005536Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group
2025-12-04T10:48:15.6006833Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T10:48:15.6008115Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping
2025-12-04T10:48:15.6009463Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph
2025-12-04T10:48:15.6010636Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T10:48:15.6011880Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params
2025-12-04T10:48:15.6013190Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T10:48:15.6014379Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
2025-12-04T10:48:15.6015539Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T10:48:15.6017011Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T10:48:15.6018247Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup
2025-12-04T10:48:15.6019461Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups
2025-12-04T10:48:15.6020986Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T10:48:15.6022433Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T10:48:15.6023662Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T10:48:15.6024795Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl
2025-12-04T10:48:15.6025958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu
2025-12-04T10:48:15.6027342Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:48:15.6028108Z 
2025-12-04T10:48:15.6028361Z Running distributed tests for the gloo backend with env init_method
2025-12-04T10:48:15.6028880Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:48:15.6030255Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:48:15.595174]
2025-12-04T10:52:53.4845422Z 
2025-12-04T10:52:53.4848674Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_6f55519eb0301937_.log
2025-12-04T10:52:53.4868641Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:52:53.4887916Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager
2025-12-04T10:52:53.4889186Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU
2025-12-04T10:52:53.4890549Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last
2025-12-04T10:52:53.4891913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync
2025-12-04T10:52:53.4893167Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda
2025-12-04T10:52:53.4894422Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T10:52:53.4895665Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max
2025-12-04T10:52:53.4897288Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T10:52:53.4898676Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported
2025-12-04T10:52:53.4900022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T10:52:53.4901251Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min
2025-12-04T10:52:53.4902453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T10:52:53.4903641Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async
2025-12-04T10:52:53.4904858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T10:52:53.4906123Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split
2025-12-04T10:52:53.4907462Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex
2025-12-04T10:52:53.4908962Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group
2025-12-04T10:52:53.4910323Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group
2025-12-04T10:52:53.4911622Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T10:52:53.4912881Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping
2025-12-04T10:52:53.4914145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph
2025-12-04T10:52:53.4915349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T10:52:53.4916609Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params
2025-12-04T10:52:53.4917918Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T10:52:53.4919113Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
2025-12-04T10:52:53.4920232Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T10:52:53.4921723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T10:52:53.4922895Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup
2025-12-04T10:52:53.4924118Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups
2025-12-04T10:52:53.4925439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T10:52:53.4926875Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T10:52:53.4928099Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T10:52:53.4929347Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl
2025-12-04T10:52:53.4930507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu
2025-12-04T10:52:53.4931799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:52:53.4932559Z 
2025-12-04T10:52:53.4932828Z Running distributed tests for the gloo backend with file init_method
2025-12-04T10:52:53.4933435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:52:53.4934762Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:52:53.485987]
2025-12-04T10:57:31.0735284Z 
2025-12-04T10:57:31.0736530Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_c42c9aaca0d3f434_.log
2025-12-04T10:57:31.0757354Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:57:31.0776557Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager
2025-12-04T10:57:31.0778059Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU
2025-12-04T10:57:31.0779464Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last
2025-12-04T10:57:31.0780848Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync
2025-12-04T10:57:31.0782145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda
2025-12-04T10:57:31.0783435Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T10:57:31.0784718Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max
2025-12-04T10:57:31.0786005Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T10:57:31.0787410Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported
2025-12-04T10:57:31.0788864Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T10:57:31.0790067Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min
2025-12-04T10:57:31.0791212Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T10:57:31.0792328Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async
2025-12-04T10:57:31.0793507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T10:57:31.0794738Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split
2025-12-04T10:57:31.0796048Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex
2025-12-04T10:57:31.0797403Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group
2025-12-04T10:57:31.0798761Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group
2025-12-04T10:57:31.0800061Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T10:57:31.0801398Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping
2025-12-04T10:57:31.0802649Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph
2025-12-04T10:57:31.0803818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T10:57:31.0805082Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params
2025-12-04T10:57:31.0806395Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T10:57:31.0807606Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
2025-12-04T10:57:31.0808732Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T10:57:31.0809806Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
﻿2025-12-04T10:57:31.0813748Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup
2025-12-04T10:57:31.0814919Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups
2025-12-04T10:57:31.0816200Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T10:57:31.0817874Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T10:57:31.0819108Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T10:57:31.0820227Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl
2025-12-04T10:57:31.0821688Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu
2025-12-04T10:57:31.0823018Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T10:57:31.0823904Z 
2025-12-04T10:57:31.0824311Z Finished distributed/test_distributed_spawn 1/9 ... [2025-12-04 10:57:31.074722][7482.682638476], took 17.09min
2025-12-04T10:57:31.1014396Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.xml
2025-12-04T10:57:31.1780421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.xml
2025-12-04T10:57:31.2112887Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.xml
2025-12-04T10:57:31.2355017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.xml
2025-12-04T10:57:31.2613019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.xml
2025-12-04T10:57:31.2877035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.xml
2025-12-04T10:57:31.3114370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.xml
2025-12-04T10:57:31.3418713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.xml
2025-12-04T10:57:31.3699749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.xml
2025-12-04T10:57:31.4005907Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.xml
2025-12-04T10:57:31.4293721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.xml
2025-12-04T10:57:31.4580201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.xml
2025-12-04T10:57:31.4884101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.xml
2025-12-04T10:57:31.5140063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.xml
2025-12-04T10:57:31.5606600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.xml
2025-12-04T10:57:31.5894737Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.xml
2025-12-04T10:57:31.6190636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.xml
2025-12-04T10:57:31.6453560Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.xml
2025-12-04T10:57:31.6812066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.xml
2025-12-04T10:57:31.7124116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.xml
2025-12-04T10:57:31.7375311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.xml
2025-12-04T10:57:31.7694808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.xml
2025-12-04T10:57:31.7952543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.xml
2025-12-04T10:57:31.8265633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.xml
2025-12-04T10:57:31.8547188Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.xml
2025-12-04T10:57:31.8836260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.xml
2025-12-04T10:57:31.9138572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.xml
2025-12-04T10:57:31.9431382Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.xml
2025-12-04T10:57:31.9715140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.xml
2025-12-04T10:57:31.9983056Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.xml
2025-12-04T10:57:32.0255956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.xml
2025-12-04T10:57:32.0553565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.xml
2025-12-04T10:57:32.0857135Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.xml
2025-12-04T10:57:32.1180345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.xml
2025-12-04T10:57:32.1500347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.xml
2025-12-04T10:57:32.1804811Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.xml
2025-12-04T10:57:32.2069718Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.xml
2025-12-04T10:57:32.2366600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.xml
2025-12-04T10:57:32.2636307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.xml
2025-12-04T10:57:32.2935109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.xml
2025-12-04T10:57:32.3332043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.xml
2025-12-04T10:57:32.3653069Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.xml
2025-12-04T10:57:32.3964651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.xml
2025-12-04T10:57:32.4255160Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.xml
2025-12-04T10:57:32.4524068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.xml
2025-12-04T10:57:32.4820716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.xml
2025-12-04T10:57:32.5116750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.xml
2025-12-04T10:57:32.5415206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.xml
2025-12-04T10:57:32.5895778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.xml
2025-12-04T10:57:32.6234208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.xml
2025-12-04T10:57:32.6523608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.xml
2025-12-04T10:57:32.6814338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.xml
2025-12-04T10:57:32.7084664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.xml
2025-12-04T10:57:32.7368208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.xml
2025-12-04T10:57:32.7677675Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.xml
2025-12-04T10:57:32.7940299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.xml
2025-12-04T10:57:32.8235448Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.xml
2025-12-04T10:57:32.8593528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.xml
2025-12-04T10:57:32.8886757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.xml
2025-12-04T10:57:32.9195127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.xml
2025-12-04T10:57:32.9482195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.xml
2025-12-04T10:57:32.9803068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.xml
2025-12-04T10:57:33.0060276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.xml
2025-12-04T10:57:33.0354873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.xml
2025-12-04T10:57:33.0642342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.xml
2025-12-04T10:57:33.0941243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.xml
2025-12-04T10:57:33.1221282Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.xml
2025-12-04T10:57:33.1535132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.xml
2025-12-04T10:57:33.1819733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.xml
2025-12-04T10:57:33.2181329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.xml
2025-12-04T10:57:33.2579714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.xml
2025-12-04T10:57:33.2876980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.xml
2025-12-04T10:57:33.3499728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.xml
2025-12-04T10:57:33.3823938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.xml
2025-12-04T10:57:33.4124614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.xml
2025-12-04T10:57:33.4444608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.xml
2025-12-04T10:57:33.4854499Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.xml
2025-12-04T10:57:33.5155601Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.xml
2025-12-04T10:57:33.5447398Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.xml
2025-12-04T10:57:33.5754687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.xml
2025-12-04T10:57:33.6064886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.xml
2025-12-04T10:57:33.6379852Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.xml
2025-12-04T10:57:33.6642066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.xml
2025-12-04T10:57:33.6924244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.xml
2025-12-04T10:57:33.7713095Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.xml
2025-12-04T10:57:33.8137953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.xml
2025-12-04T10:57:33.8415082Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.xml
2025-12-04T10:57:33.8695498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.xml
2025-12-04T10:57:33.9112828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.xml
2025-12-04T10:57:33.9413704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.xml
2025-12-04T10:57:33.9846206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.xml
2025-12-04T10:57:34.0140725Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.xml
2025-12-04T10:57:34.0484083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.xml
2025-12-04T10:57:34.0775952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.xml
2025-12-04T10:57:34.1084368Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.xml
2025-12-04T10:57:34.1395487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.xml
2025-12-04T10:57:34.1684888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.xml
2025-12-04T10:57:34.2092687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.xml
2025-12-04T10:57:34.2408905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.xml
2025-12-04T10:57:34.2696501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.xml
2025-12-04T10:57:34.3005868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.xml
2025-12-04T10:57:34.3794331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.xml
2025-12-04T10:57:34.4057338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.xml
2025-12-04T10:57:34.4324262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.xml
2025-12-04T10:57:34.4619701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.xml
2025-12-04T10:57:34.4939887Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.xml
2025-12-04T10:57:34.5340922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.xml
2025-12-04T10:57:34.5678427Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.xml
2025-12-04T10:57:34.5976015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.xml
2025-12-04T10:57:34.6284476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.xml
2025-12-04T10:57:34.6574848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.xml
2025-12-04T10:57:34.6855574Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.xml
2025-12-04T10:57:34.7154456Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.xml
2025-12-04T10:57:34.7461784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.xml
2025-12-04T10:57:34.7747668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.xml
2025-12-04T10:57:34.8296410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.xml
2025-12-04T10:57:34.8576963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.xml
2025-12-04T10:57:34.8861667Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.xml
2025-12-04T10:57:34.9133810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.xml
2025-12-04T10:57:34.9437653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.xml
2025-12-04T10:57:34.9891888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.xml
2025-12-04T10:57:35.0195023Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.xml
2025-12-04T10:57:35.0503185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.xml
2025-12-04T10:57:35.0802826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.xml
2025-12-04T10:57:35.1083553Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.xml
2025-12-04T10:57:35.1396331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.xml
2025-12-04T10:57:35.1881202Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.xml
2025-12-04T10:57:35.2175347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.xml
2025-12-04T10:57:35.2453888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.xml
2025-12-04T10:57:35.2735517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.xml
2025-12-04T10:57:35.2996286Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.xml
2025-12-04T10:57:35.3375807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.xml
2025-12-04T10:57:35.3807979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.xml
2025-12-04T10:57:35.4062375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.xml
2025-12-04T10:57:35.4346736Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.xml
2025-12-04T10:57:35.4645001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.xml
2025-12-04T10:57:35.5020633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.xml
2025-12-04T10:57:35.5335305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.xml
2025-12-04T10:57:35.5622981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.xml
2025-12-04T10:57:35.5898084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.xml
2025-12-04T10:57:35.6167227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.xml
2025-12-04T10:57:35.6476625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.xml
2025-12-04T10:57:35.6762701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.xml
2025-12-04T10:57:35.7017759Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.xml
2025-12-04T10:57:35.7301924Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.xml
2025-12-04T10:57:35.7575260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.xml
2025-12-04T10:57:35.7844035Z Running distributed/test_distributed_spawn 4/9 ... [2025-12-04 10:57:35.783858][7487.391774661]
2025-12-04T10:57:35.7844801Z Running distributed tests for the test backend with env init_method
2025-12-04T10:57:35.7845311Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:57:35.7847794Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:35.784582]
2025-12-04T10:57:39.3630208Z 
2025-12-04T10:57:39.3631346Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_cfb55a01555794b3_.log
2025-12-04T10:57:39.3632426Z Running 0 items in this shard:
2025-12-04T10:57:39.3632766Z 
2025-12-04T10:57:39.3637986Z Running distributed tests for the test backend with file init_method
2025-12-04T10:57:39.3639753Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:57:39.3643537Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:39.364165]
2025-12-04T10:57:42.9325367Z 
2025-12-04T10:57:42.9326508Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_5d1f467e5bbdaff2_.log
2025-12-04T10:57:42.9327604Z Running 0 items in this shard:
2025-12-04T10:57:42.9327838Z 
2025-12-04T10:57:42.9332760Z Running distributed tests for the mpi backend with env init_method
2025-12-04T10:57:43.0577988Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:57:43.0579750Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:43.057584]
2025-12-04T10:57:47.2123115Z 
2025-12-04T10:57:47.2124282Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_b5a10ee12046d5b9_.log
2025-12-04T10:57:47.2125386Z Running 0 items in this shard:
2025-12-04T10:57:47.2125737Z Running 0 items in this shard:
2025-12-04T10:57:47.2126073Z Running 0 items in this shard:
2025-12-04T10:57:47.2126282Z 
2025-12-04T10:57:47.2128378Z Running distributed tests for the mpi backend with file init_method
2025-12-04T10:57:47.3386971Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:57:47.3390828Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:47.338839]
2025-12-04T10:57:51.5222397Z 
2025-12-04T10:57:51.5223570Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_de48cc4d8d8e3c13_.log
2025-12-04T10:57:51.5224892Z Running 0 items in this shard:
2025-12-04T10:57:51.5225233Z Running 0 items in this shard:
2025-12-04T10:57:51.5225565Z Running 0 items in this shard:
2025-12-04T10:57:51.5225772Z 
2025-12-04T10:57:51.5230203Z Running distributed tests for the nccl backend with env init_method
2025-12-04T10:57:51.5231869Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:57:51.5235956Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:51.523410]
2025-12-04T11:01:37.4701414Z 
2025-12-04T11:01:37.4705336Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_5fb338ab863a3c8f_.log
2025-12-04T11:01:37.4723028Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:01:37.4741116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager
2025-12-04T11:01:37.4742723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input
2025-12-04T11:01:37.4744310Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T11:01:37.4745944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view
2025-12-04T11:01:37.4747289Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup
2025-12-04T11:01:37.4748704Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T11:01:37.4750000Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T11:01:37.4751251Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T11:01:37.4752452Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min
2025-12-04T11:01:37.4753619Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T11:01:37.4754741Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product
2025-12-04T11:01:37.4755848Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T11:01:37.4756940Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T11:01:37.4758176Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda
2025-12-04T11:01:37.4759453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda
2025-12-04T11:01:37.4760733Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err
2025-12-04T11:01:37.4761876Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T11:01:37.4762966Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager
2025-12-04T11:01:37.4764260Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad
2025-12-04T11:01:37.4765530Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph
2025-12-04T11:01:37.4766689Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook
2025-12-04T11:01:37.4767908Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD
2025-12-04T11:01:37.4769104Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T11:01:37.4770365Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T11:01:37.4771590Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states
2025-12-04T11:01:37.4772798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T11:01:37.4773932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T11:01:37.4775023Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T11:01:37.4776209Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product
2025-12-04T11:01:37.4777681Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min
2025-12-04T11:01:37.4778818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:01:37.4779446Z 
2025-12-04T11:01:37.4779699Z Running distributed tests for the nccl backend with file init_method
2025-12-04T11:01:37.4780216Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:01:37.4781584Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:01:37.472191]
2025-12-04T11:05:23.4625233Z 
2025-12-04T11:05:23.4626380Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_024341bf790fe69a_.log
2025-12-04T11:05:23.4644091Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:05:23.4661858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager
2025-12-04T11:05:23.4663469Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input
2025-12-04T11:05:23.4665055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T11:05:23.4666626Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view
2025-12-04T11:05:23.4667967Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup
2025-12-04T11:05:23.4669360Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T11:05:23.4670659Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T11:05:23.4671908Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T11:05:23.4673095Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min
2025-12-04T11:05:23.4674343Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T11:05:23.4675500Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product
2025-12-04T11:05:23.4676597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T11:05:23.4677697Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T11:05:23.4678941Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda
2025-12-04T11:05:23.4680223Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda
2025-12-04T11:05:23.4681428Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err
2025-12-04T11:05:23.4682571Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T11:05:23.4683697Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager
2025-12-04T11:05:23.4685004Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad
2025-12-04T11:05:23.4686291Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph
2025-12-04T11:05:23.4687453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook
2025-12-04T11:05:23.4688670Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD
2025-12-04T11:05:23.4689871Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T11:05:23.4691145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T11:05:23.4692382Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states
2025-12-04T11:05:23.4693579Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T11:05:23.4694724Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T11:05:23.4695828Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T11:05:23.4697294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product
2025-12-04T11:05:23.4698485Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min
2025-12-04T11:05:23.4699635Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:05:23.4700279Z 
2025-12-04T11:05:23.4700531Z Running distributed tests for the gloo backend with env init_method
2025-12-04T11:05:23.4701050Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:05:23.4702404Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:05:23.464541]
2025-12-04T11:09:43.3178282Z 
2025-12-04T11:09:43.3179725Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_807ef3b254ee9578_.log
2025-12-04T11:09:43.3197233Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:09:43.3214392Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager
2025-12-04T11:09:43.3215958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input
2025-12-04T11:09:43.3217838Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T11:09:43.3219568Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view
2025-12-04T11:09:43.3221129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup
2025-12-04T11:09:43.3222429Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T11:09:43.3223780Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T11:09:43.3225070Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T11:09:43.3226308Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min
2025-12-04T11:09:43.3227503Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T11:09:43.3228680Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product
2025-12-04T11:09:43.3229895Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T11:09:43.3231028Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T11:09:43.3232300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda
2025-12-04T11:09:43.3233705Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda
2025-12-04T11:09:43.3234921Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err
2025-12-04T11:09:43.3236072Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T11:09:43.3237198Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager
2025-12-04T11:09:43.3238501Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad
2025-12-04T11:09:43.3239786Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph
2025-12-04T11:09:43.3240950Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook
2025-12-04T11:09:43.3242151Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD
2025-12-04T11:09:43.3243352Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T11:09:43.3244584Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T11:09:43.3245821Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states
2025-12-04T11:09:43.3247013Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T11:09:43.3248150Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T11:09:43.3249262Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T11:09:43.3250546Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product
2025-12-04T11:09:43.3251711Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min
2025-12-04T11:09:43.3252813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:09:43.3253443Z 
2025-12-04T11:09:43.3253691Z Running distributed tests for the gloo backend with file init_method
2025-12-04T11:09:43.3254196Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:09:43.3255525Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:09:43.319508]
2025-12-04T11:14:03.4124625Z 
2025-12-04T11:14:03.4125764Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_a98bc48b8a2bbb0a_.log
2025-12-04T11:14:03.4143800Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:14:03.4161359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager
2025-12-04T11:14:03.4162924Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input
2025-12-04T11:14:03.4164465Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T11:14:03.4165991Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view
2025-12-04T11:14:03.4167294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup
2025-12-04T11:14:03.4168614Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T11:14:03.4169911Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T11:14:03.4171158Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T11:14:03.4172359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min
2025-12-04T11:14:03.4173552Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T11:14:03.4174694Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product
2025-12-04T11:14:03.4175823Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T11:14:03.4177186Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T11:14:03.4178465Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda
2025-12-04T11:14:03.4179791Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda
2025-12-04T11:14:03.4181034Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err
2025-12-04T11:14:03.4182222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T11:14:03.4183349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager
2025-12-04T11:14:03.4184685Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad
2025-12-04T11:14:03.4186001Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph
2025-12-04T11:14:03.4187194Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook
2025-12-04T11:14:03.4188580Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD
2025-12-04T11:14:03.4189921Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T11:14:03.4191107Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T11:14:03.4192302Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states
2025-12-04T11:14:03.4193475Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T11:14:03.4194582Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T11:14:03.4195637Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T11:14:03.4196790Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product
2025-12-04T11:14:03.4197913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min
2025-12-04T11:14:03.4199020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda
2025-12-04T11:14:03.4199610Z 
2025-12-04T11:14:03.4200093Z Finished distributed/test_distributed_spawn 4/9 ... [2025-12-04 11:14:03.413916][8475.021831248], took 16.46min
2025-12-04T11:14:03.4412464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.xml
2025-12-04T11:14:03.5241724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.xml
2025-12-04T11:14:03.5512497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.xml
2025-12-04T11:14:03.5755671Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.xml
2025-12-04T11:14:03.6024232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.xml
2025-12-04T11:14:03.6290074Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.xml
2025-12-04T11:14:03.6596516Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.xml
2025-12-04T11:14:03.6917072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.xml
2025-12-04T11:14:03.7286908Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.xml
2025-12-04T11:14:03.7625371Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.xml
2025-12-04T11:14:03.8368399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.xml
2025-12-04T11:14:03.8700116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.xml
2025-12-04T11:14:03.9031137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.xml
2025-12-04T11:14:03.9321492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.xml
2025-12-04T11:14:03.9627186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.xml
2025-12-04T11:14:03.9937664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.xml
2025-12-04T11:14:04.0469642Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.xml
2025-12-04T11:14:04.0859312Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.xml
2025-12-04T11:14:04.1217466Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.xml
2025-12-04T11:14:04.1538723Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.xml
2025-12-04T11:14:04.1925121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.xml
2025-12-04T11:14:04.2289882Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.xml
2025-12-04T11:14:04.2568105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.xml
2025-12-04T11:14:04.2857537Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.xml
2025-12-04T11:14:04.3577806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.xml
2025-12-04T11:14:04.3946937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.xml
2025-12-04T11:14:04.4337658Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.xml
2025-12-04T11:14:04.4739132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.xml
2025-12-04T11:14:04.5127826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.xml
2025-12-04T11:14:04.5499042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.xml
2025-12-04T11:14:04.6179301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.xml
2025-12-04T11:14:04.7318032Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.xml
2025-12-04T11:14:04.7707780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.xml
2025-12-04T11:14:05.0124581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.xml
2025-12-04T11:14:05.0498716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.xml
2025-12-04T11:14:05.0809207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.xml
2025-12-04T11:14:05.1099357Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.xml
2025-12-04T11:14:05.1619870Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.xml
2025-12-04T11:14:05.1904994Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.xml
2025-12-04T11:14:05.2226548Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.xml
2025-12-04T11:14:05.2559649Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.xml
2025-12-04T11:14:05.2924541Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.xml
2025-12-04T11:14:05.3272367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.xml
2025-12-04T11:14:05.3631861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.xml
2025-12-04T11:14:05.3961006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.xml
2025-12-04T11:14:05.4299075Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.xml
2025-12-04T11:14:05.4979085Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.xml
2025-12-04T11:14:05.5290087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.xml
2025-12-04T11:14:05.5597070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.xml
2025-12-04T11:14:05.5933577Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.xml
2025-12-04T11:14:05.6224684Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.xml
2025-12-04T11:14:05.6658882Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.xml
2025-12-04T11:14:05.7009910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.xml
2025-12-04T11:14:05.7378961Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.xml
2025-12-04T11:14:05.7703269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.xml
2025-12-04T11:14:05.8057504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.xml
2025-12-04T11:14:05.8390073Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.xml
2025-12-04T11:14:06.0779837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.xml
2025-12-04T11:14:06.1125932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.xml
2025-12-04T11:14:06.3550489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.xml
2025-12-04T11:14:06.3842497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.xml
2025-12-04T11:14:06.4159366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.xml
2025-12-04T11:14:06.4831303Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.xml
2025-12-04T11:14:06.5157593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.xml
2025-12-04T11:14:06.5488068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.xml
2025-12-04T11:14:06.5802502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.xml
2025-12-04T11:14:06.6119264Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.xml
2025-12-04T11:14:06.6478311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.xml
2025-12-04T11:14:06.6820508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.xml
2025-12-04T11:14:06.7156653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.xml
2025-12-04T11:14:06.7550159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.xml
2025-12-04T11:14:06.7898884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.xml
2025-12-04T11:14:06.8195691Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.xml
2025-12-04T11:14:06.8517498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.xml
2025-12-04T11:14:06.8809813Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.xml
2025-12-04T11:14:07.1196057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.xml
2025-12-04T11:14:07.1498662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.xml
2025-12-04T11:14:07.1818879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.xml
2025-12-04T11:14:07.2127352Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.xml
2025-12-04T11:14:07.2456786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.xml
2025-12-04T11:14:07.2778047Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.xml
2025-12-04T11:14:07.3088288Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.xml
2025-12-04T11:14:07.3431351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.xml
2025-12-04T11:14:07.3761953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.xml
2025-12-04T11:14:07.4081509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.xml
2025-12-04T11:14:07.4474987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.xml
2025-12-04T11:14:07.4769950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.xml
2025-12-04T11:14:07.5088674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.xml
2025-12-04T11:14:07.5378889Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.xml
2025-12-04T11:14:07.5757749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.xml
2025-12-04T11:14:07.6068441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.xml
2025-12-04T11:14:07.6457111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.xml
2025-12-04T11:14:07.6787467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.xml
2025-12-04T11:14:07.7070243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.xml
2025-12-04T11:14:07.7365479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.xml
2025-12-04T11:14:07.7717624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.xml
2025-12-04T11:14:07.8011002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.xml
2025-12-04T11:14:07.8269054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.xml
2025-12-04T11:14:07.8618201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.xml
2025-12-04T11:14:07.9044831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.xml
2025-12-04T11:14:07.9578679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.xml
2025-12-04T11:14:08.0869922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.xml
2025-12-04T11:14:08.1230571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.xml
2025-12-04T11:14:08.2219034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.xml
2025-12-04T11:14:08.2578117Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.xml
2025-12-04T11:14:08.2968321Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.xml
2025-12-04T11:14:08.3850291Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.xml
2025-12-04T11:14:08.4269051Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.xml
2025-12-04T11:14:08.4626237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.xml
2025-12-04T11:14:08.4979816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.xml
2025-12-04T11:14:08.5432749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.xml
2025-12-04T11:14:08.5807953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.xml
2025-12-04T11:14:08.6187791Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.xml
2025-12-04T11:14:08.6596825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.xml
2025-12-04T11:14:08.7018682Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.xml
2025-12-04T11:14:08.7488590Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.xml
2025-12-04T11:14:08.7950522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.xml
2025-12-04T11:14:08.8769233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.xml
2025-12-04T11:14:08.9133698Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.xml
2025-12-04T11:14:08.9459024Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.xml
2025-12-04T11:14:08.9749561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.xml
2025-12-04T11:14:09.0137244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.xml
2025-12-04T11:14:09.0539116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.xml
2025-12-04T11:14:09.0928735Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.xml
2025-12-04T11:14:09.1288986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.xml
2025-12-04T11:14:09.1792631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.xml
2025-12-04T11:14:09.2098757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.xml
2025-12-04T11:14:09.2459603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.xml
2025-12-04T11:14:09.2858962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.xml
2025-12-04T11:14:09.3225643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.xml
2025-12-04T11:14:09.8424042Z Uploading artifacts took 0.49 seconds
2025-12-04T11:14:09.8431540Z Running distributed/test_distributed_spawn 7/9 ... [2025-12-04 11:14:09.842757][8481.450673212]
2025-12-04T11:14:09.8432253Z Running distributed tests for the test backend with env init_method
2025-12-04T11:14:09.8433083Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:14:09.8436922Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:09.843494]
2025-12-04T11:14:13.4195652Z 
2025-12-04T11:14:13.4197032Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e6318e4f5e3f044b_.log
2025-12-04T11:14:13.4198182Z Running 0 items in this shard:
2025-12-04T11:14:13.4198407Z 
2025-12-04T11:14:13.4199734Z Running distributed tests for the test backend with file init_method
2025-12-04T11:14:13.4201378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:14:13.4205121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:13.420322]
2025-12-04T11:14:16.9927627Z 
2025-12-04T11:14:16.9928797Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_7d14db48d459fad6_.log
2025-12-04T11:14:16.9929868Z Running 0 items in this shard:
2025-12-04T11:14:16.9930083Z 
2025-12-04T11:14:16.9935457Z Running distributed tests for the mpi backend with env init_method
2025-12-04T11:14:17.1213488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:14:17.1215617Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:17.121189]
2025-12-04T11:14:21.2925923Z 
2025-12-04T11:14:21.2927042Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_867e6ca715844bef_.log
2025-12-04T11:14:21.2928370Z Running 0 items in this shard:
2025-12-04T11:14:21.2928780Z Running 0 items in this shard:
2025-12-04T11:14:21.2929124Z Running 0 items in this shard:
2025-12-04T11:14:21.2929350Z 
2025-12-04T11:14:21.2934207Z Running distributed tests for the mpi backend with file init_method
2025-12-04T11:14:21.4214525Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:14:21.4216169Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:21.421275]
2025-12-04T11:14:25.6263743Z 
2025-12-04T11:14:25.6264872Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e3e9b753abf00510_.log
2025-12-04T11:14:25.6265938Z Running 0 items in this shard:
2025-12-04T11:14:25.6266297Z Running 0 items in this shard:
2025-12-04T11:14:25.6266636Z Running 0 items in this shard:
2025-12-04T11:14:25.6266857Z 
2025-12-04T11:14:25.6271115Z Running distributed tests for the nccl backend with env init_method
2025-12-04T11:14:25.6272844Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:14:25.6276842Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:25.627494]
2025-12-04T11:18:40.5506209Z 
2025-12-04T11:18:40.5507479Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_57c28f64236fb5f7_.log
2025-12-04T11:18:40.5528307Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:18:40.5548789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class
2025-12-04T11:18:40.5550083Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T11:18:40.5551539Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T11:18:40.5552907Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook
2025-12-04T11:18:40.5554155Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T11:18:40.5555365Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min
2025-12-04T11:18:40.5556725Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T11:18:40.5558055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group
2025-12-04T11:18:40.5559247Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda
2025-12-04T11:18:40.5560548Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda
2025-12-04T11:18:40.5561747Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T11:18:40.5562801Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda
2025-12-04T11:18:40.5563916Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group
2025-12-04T11:18:40.5565148Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T11:18:40.5566363Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list
2025-12-04T11:18:40.5567532Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async
2025-12-04T11:18:40.5568877Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger
2025-12-04T11:18:40.5570300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T11:18:40.5571837Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn
2025-12-04T11:18:40.5573042Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group
2025-12-04T11:18:40.5574170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params
2025-12-04T11:18:40.5575290Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future
2025-12-04T11:18:40.5576459Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph
2025-12-04T11:18:40.5577818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler
2025-12-04T11:18:40.5579078Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order
2025-12-04T11:18:40.5580411Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce
2025-12-04T11:18:40.5581578Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum
2025-12-04T11:18:40.5582653Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda
2025-12-04T11:18:40.5583862Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler
2025-12-04T11:18:40.5585168Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T11:18:40.5586421Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler
2025-12-04T11:18:40.5587730Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T11:18:40.5589250Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:18:40.5589950Z 
2025-12-04T11:18:40.5590190Z Running distributed tests for the nccl backend with file init_method
2025-12-04T11:18:40.5590675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:18:40.5591963Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:40.552460]
2025-12-04T11:22:55.3092749Z 
2025-12-04T11:22:55.3094091Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e15417bf2d6aa02d_.log
2025-12-04T11:22:55.3113218Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:22:55.3131777Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class
2025-12-04T11:22:55.3133200Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T11:22:55.3134591Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T11:22:55.3135974Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook
2025-12-04T11:22:55.3137489Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T11:22:55.3138738Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min
2025-12-04T11:22:55.3140091Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T11:22:55.3141385Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group
2025-12-04T11:22:55.3144159Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda
2025-12-04T11:22:55.3145515Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda
2025-12-04T11:22:55.3146792Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T11:22:55.3147858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda
2025-12-04T11:22:55.3149178Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group
2025-12-04T11:22:55.3150378Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T11:22:55.3151578Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list
2025-12-04T11:22:55.3152707Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async
2025-12-04T11:22:55.3154079Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger
2025-12-04T11:22:55.3155402Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T11:22:55.3156637Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn
2025-12-04T11:22:55.3157806Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group
2025-12-04T11:22:55.3158918Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params
2025-12-04T11:22:55.3160020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future
2025-12-04T11:22:55.3161139Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph
2025-12-04T11:22:55.3162259Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler
2025-12-04T11:22:55.3163449Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order
2025-12-04T11:22:55.3164660Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce
2025-12-04T11:22:55.3165765Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum
2025-12-04T11:22:55.3166859Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda
2025-12-04T11:22:55.3168132Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler
2025-12-04T11:22:55.3169359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T11:22:55.3170541Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler
2025-12-04T11:22:55.3171757Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T11:22:55.3173023Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:22:55.3173728Z 
2025-12-04T11:22:55.3173995Z Running distributed tests for the gloo backend with env init_method
2025-12-04T11:22:55.3174507Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:22:55.3175803Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:55.310735]
2025-12-04T11:26:59.6765996Z 
2025-12-04T11:26:59.6767003Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_7faf7d03bb4df9a2_.log
2025-12-04T11:26:59.6785758Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:26:59.6804022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class
2025-12-04T11:26:59.6805328Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T11:26:59.6806728Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T11:26:59.6808086Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook
2025-12-04T11:26:59.6809332Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T11:26:59.6810551Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min
2025-12-04T11:26:59.6811884Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T11:26:59.6813170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group
2025-12-04T11:26:59.6814359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda
2025-12-04T11:26:59.6815664Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda
2025-12-04T11:26:59.6817137Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T11:26:59.6818208Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda
2025-12-04T11:26:59.6819370Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group
2025-12-04T11:26:59.6820690Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T11:26:59.6822188Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list
2025-12-04T11:26:59.6823391Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async
2025-12-04T11:26:59.6824798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger
2025-12-04T11:26:59.6826191Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T11:26:59.6827517Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn
2025-12-04T11:26:59.6828759Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group
2025-12-04T11:26:59.6829927Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params
2025-12-04T11:26:59.6831088Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future
2025-12-04T11:26:59.6832225Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph
2025-12-04T11:26:59.6833492Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler
2025-12-04T11:26:59.6834792Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order
2025-12-04T11:26:59.6836080Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce
2025-12-04T11:26:59.6837219Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum
2025-12-04T11:26:59.6838279Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda
2025-12-04T11:26:59.6839438Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler
2025-12-04T11:26:59.6840695Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T11:26:59.6841912Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler
2025-12-04T11:26:59.6843162Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T11:26:59.6844454Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:26:59.6845223Z 
2025-12-04T11:26:59.6845471Z Running distributed tests for the gloo backend with file init_method
2025-12-04T11:26:59.6845976Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:26:59.6847300Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:26:59.678112]
2025-12-04T11:31:03.8198200Z 
2025-12-04T11:31:03.8199421Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_99251297b874e698_.log
2025-12-04T11:31:03.8218174Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:31:03.8236775Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class
2025-12-04T11:31:03.8238035Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T11:31:03.8239395Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T11:31:03.8240756Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook
2025-12-04T11:31:03.8242015Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T11:31:03.8243192Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min
2025-12-04T11:31:03.8244486Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T11:31:03.8245703Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group
2025-12-04T11:31:03.8246857Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda
2025-12-04T11:31:03.8248122Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda
2025-12-04T11:31:03.8249296Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T11:31:03.8250315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda
2025-12-04T11:31:03.8251393Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group
2025-12-04T11:31:03.8252575Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T11:31:03.8253777Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list
2025-12-04T11:31:03.8254914Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async
2025-12-04T11:31:03.8256315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger
2025-12-04T11:31:03.8257944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T11:31:03.8259262Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn
2025-12-04T11:31:03.8260511Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group
2025-12-04T11:31:03.8261674Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params
2025-12-04T11:31:03.8262814Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future
2025-12-04T11:31:03.8263952Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph
2025-12-04T11:31:03.8265152Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler
2025-12-04T11:31:03.8266476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order
2025-12-04T11:31:03.8267750Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce
2025-12-04T11:31:03.8269026Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum
2025-12-04T11:31:03.8270156Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda
2025-12-04T11:31:03.8271295Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler
2025-12-04T11:31:03.8272508Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T11:31:03.8273720Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler
2025-12-04T11:31:03.8274932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T11:31:03.8276201Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters
2025-12-04T11:31:03.8276888Z 
2025-12-04T11:31:03.8277284Z Finished distributed/test_distributed_spawn 7/9 ... [2025-12-04 11:31:03.820755][9495.428670826], took 16.90min
2025-12-04T11:31:03.8489666Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.xml
2025-12-04T11:31:03.9270369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.xml
2025-12-04T11:31:03.9577253Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.xml
2025-12-04T11:31:03.9859236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.xml
2025-12-04T11:31:04.0212281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.xml
2025-12-04T11:31:04.0517084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.xml
2025-12-04T11:31:04.0895612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.xml
2025-12-04T11:31:04.1280894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.xml
2025-12-04T11:31:04.1752398Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.xml
2025-12-04T11:31:04.2168656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.xml
2025-12-04T11:31:04.2537150Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.xml
2025-12-04T11:31:04.2918769Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.xml
2025-12-04T11:31:04.3259657Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.xml
2025-12-04T11:31:04.3570157Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.xml
2025-12-04T11:31:04.3929906Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.xml
2025-12-04T11:31:04.4236731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.xml
2025-12-04T11:31:04.4556543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.xml
2025-12-04T11:31:04.4924448Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.xml
2025-12-04T11:31:04.5404685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.xml
2025-12-04T11:31:04.6359301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.xml
2025-12-04T11:31:04.6789932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.xml
2025-12-04T11:31:04.7295457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.xml
2025-12-04T11:31:04.7858262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.xml
2025-12-04T11:31:04.8257135Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.xml
2025-12-04T11:31:04.8708287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.xml
2025-12-04T11:31:04.9107730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.xml
2025-12-04T11:31:04.9627487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.xml
2025-12-04T11:31:05.0037777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.xml
2025-12-04T11:31:05.0349840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.xml
2025-12-04T11:31:05.0758861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.xml
2025-12-04T11:31:05.1103550Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.xml
2025-12-04T11:31:05.1509690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.xml
2025-12-04T11:31:05.2032418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.xml
2025-12-04T11:31:05.2355227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.xml
2025-12-04T11:31:05.2641231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.xml
2025-12-04T11:31:05.3040893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.xml
2025-12-04T11:31:05.3450959Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.xml
2025-12-04T11:31:05.3870301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.xml
2025-12-04T11:31:05.4189200Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.xml
2025-12-04T11:31:05.4614429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.xml
2025-12-04T11:31:05.5089216Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.xml
2025-12-04T11:31:05.5605305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.xml
2025-12-04T11:31:05.6047805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.xml
2025-12-04T11:31:05.6619972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.xml
2025-12-04T11:31:05.7037542Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.xml
2025-12-04T11:31:05.7361087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.xml
2025-12-04T11:31:05.7669474Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.xml
2025-12-04T11:31:05.8037747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.xml
2025-12-04T11:31:05.8389940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.xml
2025-12-04T11:31:05.8739291Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.xml
2025-12-04T11:31:05.9071670Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.xml
2025-12-04T11:31:05.9390802Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.xml
2025-12-04T11:31:05.9748673Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.xml
2025-12-04T11:31:06.0132299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.xml
2025-12-04T11:31:06.0598334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.xml
2025-12-04T11:31:06.0967325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.xml
2025-12-04T11:31:06.1325083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.xml
2025-12-04T11:31:06.1709561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.xml
2025-12-04T11:31:06.2368980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.xml
2025-12-04T11:31:06.2759250Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.xml
2025-12-04T11:31:06.3111917Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.xml
2025-12-04T11:31:06.3402046Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.xml
2025-12-04T11:31:06.3769278Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.xml
2025-12-04T11:31:06.4107364Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.xml
2025-12-04T11:31:06.4432340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.xml
2025-12-04T11:31:06.4741528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.xml
2025-12-04T11:31:06.5139403Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.xml
2025-12-04T11:31:06.5497179Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.xml
2025-12-04T11:31:06.5886573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.xml
2025-12-04T11:31:06.6231036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.xml
2025-12-04T11:31:06.6510312Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.xml
2025-12-04T11:31:06.6801086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.xml
2025-12-04T11:31:06.7310143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.xml
2025-12-04T11:31:06.7648634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.xml
2025-12-04T11:31:06.8125093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.xml
2025-12-04T11:31:06.8432173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.xml
2025-12-04T11:31:06.9244130Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.xml
2025-12-04T11:31:06.9718144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.xml
2025-12-04T11:31:07.0099788Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.xml
2025-12-04T11:31:07.0622538Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.xml
2025-12-04T11:31:07.0941011Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.xml
2025-12-04T11:31:07.1297118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.xml
2025-12-04T11:31:07.1679192Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.xml
2025-12-04T11:31:07.2025903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.xml
2025-12-04T11:31:07.2371894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.xml
2025-12-04T11:31:07.2757585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.xml
2025-12-04T11:31:07.3097059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.xml
2025-12-04T11:31:07.3490375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.xml
2025-12-04T11:31:07.3839065Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.xml
2025-12-04T11:31:07.4148373Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.xml
2025-12-04T11:31:07.4486169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.xml
2025-12-04T11:31:07.4876497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.xml
2025-12-04T11:31:07.5280477Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.xml
2025-12-04T11:31:07.5655676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.xml
2025-12-04T11:31:07.6025472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.xml
2025-12-04T11:31:07.6379089Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.xml
2025-12-04T11:31:07.6707092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.xml
2025-12-04T11:31:07.6997796Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.xml
2025-12-04T11:31:07.7291372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.xml
2025-12-04T11:31:07.7567284Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.xml
2025-12-04T11:31:07.7911952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.xml
2025-12-04T11:31:07.8600189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.xml
2025-12-04T11:31:07.8922388Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.xml
2025-12-04T11:31:07.9167805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.xml
2025-12-04T11:31:07.9429677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.xml
2025-12-04T11:31:07.9805489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.xml
2025-12-04T11:31:08.0079301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.xml
2025-12-04T11:31:08.0389271Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.xml
2025-12-04T11:31:08.0748948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.xml
2025-12-04T11:31:08.1071810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.xml
2025-12-04T11:31:08.1416446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.xml
2025-12-04T11:31:08.1826669Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.xml
2025-12-04T11:31:08.2140119Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.xml
2025-12-04T11:31:08.2439693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.xml
2025-12-04T11:31:08.2770492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.xml
2025-12-04T11:31:08.3070803Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.xml
2025-12-04T11:31:08.3379545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.xml
2025-12-04T11:31:08.3667775Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.xml
2025-12-04T11:31:08.3998943Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.xml
2025-12-04T11:31:08.4269764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.xml
2025-12-04T11:31:08.4567696Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.xml
2025-12-04T11:31:08.4858623Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.xml
2025-12-04T11:31:08.5166242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.xml
2025-12-04T11:31:08.5537835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.xml
2025-12-04T11:31:08.5855568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.xml
2025-12-04T11:31:08.6158631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.xml
2025-12-04T11:31:08.6480297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.xml
2025-12-04T11:31:08.6759408Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.xml
2025-12-04T11:31:08.7047062Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.xml
2025-12-04T11:31:08.7310873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.xml
2025-12-04T11:31:08.7610490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.xml
2025-12-04T11:31:08.7912136Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.xml
2025-12-04T11:31:08.8208550Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.xml
2025-12-04T11:31:08.8478557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.xml
2025-12-04T11:31:08.8808329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.xml
2025-12-04T11:31:08.9119692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.xml
2025-12-04T11:31:08.9431605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.xml
2025-12-04T11:31:08.9759099Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.xml
2025-12-04T11:31:09.0078944Z Running distributed/test_serialization 1/1 ... [2025-12-04 11:31:09.007392][9500.615309829]
2025-12-04T11:31:09.0079664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:31:09.0080916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:09.007733]
2025-12-04T11:31:13.2330819Z 
2025-12-04T11:31:13.2331927Z distributed/test_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_serialization_1.1_13a719996bf7ed77_.log
2025-12-04T11:31:13.2337565Z Running 11 items in this shard: test/distributed/test_serialization.py::TestSerialization::test_cuda, test/distributed/test_serialization.py::TestSerialization::test_dtensor, test/distributed/test_serialization.py::TestSerialization::test_empty_tensor, test/distributed/test_serialization.py::TestSerialization::test_nested_tensors, test/distributed/test_serialization.py::TestSerialization::test_python_object, test/distributed/test_serialization.py::TestSerialization::test_scalar_tensor, test/distributed/test_serialization.py::TestSerialization::test_str_utf8, test/distributed/test_serialization.py::TestSerialization::test_strided_tensor, test/distributed/test_serialization.py::TestSerialization::test_tensor_with_offset, test/distributed/test_serialization.py::TestSerialization::test_various_data_types, test/distributed/test_serialization.py::TestSerialization::test_weights_only
2025-12-04T11:31:13.2342400Z 
2025-12-04T11:31:13.2342809Z Finished distributed/test_serialization 1/1 ... [2025-12-04 11:31:13.232757][9504.840668469], took 0.07min
2025-12-04T11:31:13.2605521Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.xml
2025-12-04T11:31:13.3366118Z Running distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 11:31:13.336091][9504.944008631]
2025-12-04T11:31:13.3366791Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:31:13.3368101Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_ignored_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:13.336425]
2025-12-04T11:31:57.6129849Z 
2025-12-04T11:31:57.6132554Z distributed/fsdp/test_fsdp_ignored_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_ignored_modules_1.1_10f1fa8ebe15ff14_.log
2025-12-04T11:31:57.6138902Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_diff_ignored_modules_across_ranks, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_invalid, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_nested, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_False, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_True, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_transformer, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_auto_wrap, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_check
2025-12-04T11:31:57.6143856Z 
2025-12-04T11:31:57.6144327Z Finished distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 11:31:57.612826][9549.220738508], took 0.74min
2025-12-04T11:31:57.6410824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.xml
2025-12-04T11:31:57.7727427Z Running distributed/_composable/fsdp/test_fully_shard_comm 1/1 ... [2025-12-04 11:31:57.772517][9549.380433795]
2025-12-04T11:31:57.7728141Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:31:57.7730517Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_comm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:57.772861]
2025-12-04T11:34:53.7938441Z 
2025-12-04T11:34:53.7939786Z distributed/_composable/fsdp/test_fully_shard_comm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_comm_1.1_365cd7de0daee87d_.log
2025-12-04T11:34:53.7966989Z Running 22 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_all_gather_fp32, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_reduce_scatter_fp16, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_reduce_scatter_fp32, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_fully_shard_communication_count, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_manual_reshard_with_reshard_after_forward_false, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_set_reduce_scatter_divide_factor, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_set_reshard_after_forward, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_backward_misprefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_multi_module_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_multi_module_unused_module, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_backward_prefetch_inside_ac, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_forward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiProcess::test_unshard_async, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiThread::test_unshard_no_param_group, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiThread::test_unshard_without_lazy_init, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardAllocFromPG::test_exception_when_used_together_with_comm_hooks, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardAllocFromPG::test_fully_shard_alloc_from_pg, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardForceSumReduction::test_fully_shard_force_sum_both_reductions, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardForceSumReduction::test_fully_shard_force_sum_reduce_scatter, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardReduceOpWorldSize1::test_size1_reduceop
2025-12-04T11:34:53.8018799Z 
2025-12-04T11:34:53.8019412Z Finished distributed/_composable/fsdp/test_fully_shard_comm 1/1 ... [2025-12-04 11:34:53.801549][9725.409461872], took 2.93min
2025-12-04T11:34:53.8294070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.xml
2025-12-04T11:34:54.8729148Z Uploading artifacts took 0.91 seconds
2025-12-04T11:34:54.8735071Z Running distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 ... [2025-12-04 11:34:54.872927][9726.480842753]
2025-12-04T11:34:54.8735730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:34:54.8737405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_sharded_grad_scaler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:34:54.873272]
2025-12-04T11:37:04.4598751Z 
2025-12-04T11:37:04.4599997Z distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_sharded_grad_scaler_1.1_be49dd131ba0d1a6_.log
2025-12-04T11:37:04.4617118Z Running 20 items in this shard: test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_grad_scaling, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_inf_gradients_skip_optim_step, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_scaling_unscaling_sparse, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_sharded_grad_scaler_found_inf
2025-12-04T11:37:04.4633638Z 
2025-12-04T11:37:04.4634090Z Finished distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 ... [2025-12-04 11:37:04.459611][9856.067525451], took 2.16min
2025-12-04T11:37:04.4883871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.xml
2025-12-04T11:37:04.6099175Z Running distributed/_shard/sharding_plan/test_sharding_plan 1/1 ... [2025-12-04 11:37:04.609443][9856.217361009]
2025-12-04T11:37:04.6099901Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:37:04.6101441Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharding_plan/test_sharding_plan.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:04.609817]
2025-12-04T11:37:26.2317186Z 
2025-12-04T11:37:26.2320948Z distributed/_shard/sharding_plan/test_sharding_plan 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharding_plan.test_sharding_plan_1.1_abd5760a3cc4b6ac_.log
2025-12-04T11:37:26.2323970Z Running 3 items in this shard: test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_custom_sharding_planner, test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_shard_module_sub_process_group, test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_sharding_plan_errors
2025-12-04T11:37:26.2325804Z 
2025-12-04T11:37:26.2326503Z Finished distributed/_shard/sharding_plan/test_sharding_plan 1/1 ... [2025-12-04 11:37:26.231209][9877.839119519], took 0.36min
2025-12-04T11:37:26.2596996Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.xml
2025-12-04T11:37:26.3982459Z Running distributed/_shard/sharded_optim/test_sharded_optim 1/1 ... [2025-12-04 11:37:26.398016][9878.005932853]
2025-12-04T11:37:26.3983183Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:37:26.3985333Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_optim/test_sharded_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:26.398349]
2025-12-04T11:37:42.3032431Z 
2025-12-04T11:37:42.3033728Z distributed/_shard/sharded_optim/test_sharded_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_optim.test_sharded_optim_1.1_eb895e054ba35bc4_.log
2025-12-04T11:37:42.3035995Z Running 2 items in this shard: test/distributed/_shard/sharded_optim/test_sharded_optim.py::TestShardedOptimizer::test_named_params_with_sharded_tensor, test/distributed/_shard/sharded_optim/test_sharded_optim.py::TestShardedOptimizer::test_sharded_optim
2025-12-04T11:37:42.3037277Z 
2025-12-04T11:37:42.3037757Z Finished distributed/_shard/sharded_optim/test_sharded_optim 1/1 ... [2025-12-04 11:37:42.302901][9893.910799964], took 0.27min
2025-12-04T11:37:42.3306243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.xml
2025-12-04T11:37:42.4701159Z Running distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 ... [2025-12-04 11:37:42.469882][9894.077799811]
2025-12-04T11:37:42.4701917Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:37:42.4703923Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:42.470208]
2025-12-04T11:38:27.4483600Z 
2025-12-04T11:38:27.4487924Z distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_state_dict_1.1_b527545a7e0cfc76_.log
2025-12-04T11:38:27.4493847Z Running 7 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_2d_state_dict_correctness, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_cached_state_dict, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_state_dict_cpu_offload, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_tp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_hsdp_tp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiThread::test_rank0_offload_full_state_dict
2025-12-04T11:38:27.4499160Z 
2025-12-04T11:38:27.4499685Z Finished distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 ... [2025-12-04 11:38:27.447744][9939.055660093], took 0.75min
2025-12-04T11:38:27.4763732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.xml
2025-12-04T11:38:27.6015080Z Running distributed/tensor/test_utils 1/1 ... [2025-12-04 11:38:27.601012][9939.20893012]
2025-12-04T11:38:27.6015678Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:38:27.6017214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:38:27.601353]
2025-12-04T11:40:00.5524253Z 
2025-12-04T11:40:00.5525333Z distributed/tensor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_utils_1.1_adf864a1b1c1212f_.log
2025-12-04T11:40:00.5538470Z Running 24 items in this shard: test/distributed/tensor/test_utils.py::LocalTest::test_compute_local_shape_and_global_offset_uneven, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_1D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_1D_invalid_shape, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_failure_2D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_1D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_2D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_3D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_4D, test/distributed/tensor/test_utils.py::UtilTest::test_fsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilTest::test_hsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilTest::test_uneven_fsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_non_shard_placements, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_shard_placement, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_unsupported_placement, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_tensor_info, test/distributed/tensor/test_utils.py::TestStridedSharding::test_1d_mesh_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_2d_tensor_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_uneven_strided_shard, test/distributed/tensor/test_utils.py::Test_StridedShard_with_shard_order::test_StridedShard_not_convertible_to_shard_order, test/distributed/tensor/test_utils.py::Test_StridedShard_with_shard_order::test_StridedShard_to_shard_order, test/distributed/tensor/test_utils.py::Test2DStridedLocalShard::test_fsdp1_tp_2d_dtensor_local_shards_and_offsets, test/distributed/tensor/test_utils.py::Test2DStridedLocalShard::test_fsdp2_tp_2d_dtensor_local_shards_and_offsets, test/distributed/tensor/test_utils.py::TestExplicitRedistribute::test_explicit_matmul
2025-12-04T11:40:00.5550685Z 
2025-12-04T11:40:00.5551075Z Finished distributed/tensor/test_utils 1/1 ... [2025-12-04 11:40:00.552363][10032.160276342], took 1.55min
2025-12-04T11:40:00.5814276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.xml
2025-12-04T11:40:00.7173345Z Running distributed/_composable/fsdp/test_fully_shard_memory 1/1 ... [2025-12-04 11:40:00.716666][10032.324584443]
2025-12-04T11:40:00.7174085Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:40:00.7175648Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:40:00.717008]
2025-12-04T11:40:15.5191075Z 
2025-12-04T11:40:15.5192733Z distributed/_composable/fsdp/test_fully_shard_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_memory_1.1_49e4cc8ab7bdec96_.log
2025-12-04T11:40:15.5195092Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_memory.py::TestFullyShardMemory::test_fully_shard_del_memory, test/distributed/_composable/fsdp/test_fully_shard_memory.py::TestFullyShardMemory::test_fully_shard_training_memory
2025-12-04T11:40:15.5196432Z 
2025-12-04T11:40:15.5196941Z Finished distributed/_composable/fsdp/test_fully_shard_memory 1/1 ... [2025-12-04 11:40:15.518759][10047.126668625], took 0.25min
2025-12-04T11:40:15.5476384Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.xml
2025-12-04T11:40:15.6736960Z Running distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 11:40:15.673182][10047.28109898]
2025-12-04T11:40:15.6737845Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:40:15.6739170Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:40:15.673525]
2025-12-04T11:43:11.5121375Z 
2025-12-04T11:43:11.5122831Z distributed/checkpoint/test_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_1.1_211422b52eb9ecc9_.log
2025-12-04T11:43:11.5136539Z Running 25 items in this shard: test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_fsdp1, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_compiled_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_cpu_offload_full_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_deprecate_api, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_extra_state, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_flattened_osd, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp2, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_root_not_initialized, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_device_load_model_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_param_groups, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_non_persistent_buffers, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_optim_state_dict_param_matching, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_set_cpu_model_state_dict_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model_broadcasting_and_memory, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_shared_weight, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_single_gpu, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_state_dict_with_hook_on_keys, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_strict, test/distributed/checkpoint/test_state_dict.py::TestNoComm::test_no_dist
2025-12-04T11:43:11.5148389Z 
2025-12-04T11:43:11.5149046Z Finished distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 11:43:11.514244][10223.122157065], took 2.93min
2025-12-04T11:43:11.5433319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.xml
2025-12-04T11:43:11.6687028Z Running distributed/checkpoint/test_state_dict_utils 1/1 ... [2025-12-04 11:43:11.668031][10223.275949038]
2025-12-04T11:43:11.6687721Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:43:11.6689056Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:11.668370]
2025-12-04T11:43:56.8999016Z 
2025-12-04T11:43:56.9000256Z distributed/checkpoint/test_state_dict_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_utils_1.1_53a76f3501a79ced_.log
2025-12-04T11:43:56.9005206Z Running 7 items in this shard: test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_complicated_dict, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_cpu_and_ranks_only, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_cpu_offload_for_dtensor, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_create_cpu_state_dict, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_gather_state_dict_dtensor, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_gather_with_cpu_and_ranks_only, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_state_dict_util_distribute_tensors
2025-12-04T11:43:56.9008950Z 
2025-12-04T11:43:56.9009403Z Finished distributed/checkpoint/test_state_dict_utils 1/1 ... [2025-12-04 11:43:56.899410][10268.507325425], took 0.75min
2025-12-04T11:43:56.9284531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.xml
2025-12-04T11:43:57.0464158Z Running distributed/rpc/test_faulty_agent 1/1 ... [2025-12-04 11:43:57.046192][10268.654108426]
2025-12-04T11:43:57.0464785Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:43:57.0467226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/rpc/test_faulty_agent.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:57.046531]
2025-12-04T11:44:00.8263906Z 
2025-12-04T11:44:00.8265036Z distributed/rpc/test_faulty_agent 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.rpc.test_faulty_agent_1.1_9f30efe05bf109e0_.log
2025-12-04T11:44:00.8266372Z Running 0 items in this shard:
2025-12-04T11:44:00.8266604Z 
2025-12-04T11:44:00.8267007Z Finished distributed/rpc/test_faulty_agent 1/1 ... [2025-12-04 11:44:00.826215][10272.434131207], took 0.06min
2025-12-04T11:44:00.8970487Z Running distributed/_shard/sharded_tensor/ops/test_embedding 1/1 ... [2025-12-04 11:44:00.896508][10272.504424945]
2025-12-04T11:44:00.8971190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:44:00.8972574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:00.896884]
2025-12-04T11:44:16.7032218Z 
2025-12-04T11:44:16.7034609Z distributed/_shard/sharded_tensor/ops/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.ops.test_embedding_1.1_94d647ccb113bbd0_.log
2025-12-04T11:44:16.7038253Z Running 2 items in this shard: test/distributed/_shard/sharded_tensor/ops/test_embedding.py::TestShardedEmbedding::test_sharded_embedding_colwise, test/distributed/_shard/sharded_tensor/ops/test_embedding.py::TestShardedEmbedding::test_sharded_embedding_rowwise
2025-12-04T11:44:16.7040350Z 
2025-12-04T11:44:16.7040935Z Finished distributed/_shard/sharded_tensor/ops/test_embedding 1/1 ... [2025-12-04 11:44:16.702789][10288.310704722], took 0.26min
2025-12-04T11:44:16.7403957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.xml
2025-12-04T11:44:16.8706644Z Running distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 ... [2025-12-04 11:44:16.870425][10288.478342458]
2025-12-04T11:44:16.8707432Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:44:16.8710044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:16.870795]
2025-12-04T11:44:32.5738637Z 
2025-12-04T11:44:32.5740063Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_reshard_1.1_41e70f878ccc4095_.log
2025-12-04T11:44:32.5742762Z Running 2 items in this shard: test/distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py::TestReshard::test_sharded_tensor_reshard, test/distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py::TestReshard::test_sharded_tensor_reshard_errors
2025-12-04T11:44:32.5744274Z 
2025-12-04T11:44:32.5744804Z Finished distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 ... [2025-12-04 11:44:32.573494][10304.181410122], took 0.26min
2025-12-04T11:44:32.6029920Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.xml
2025-12-04T11:44:32.7219270Z Running distributed/test_c10d_spawn_nccl 1/1 ... [2025-12-04 11:44:32.721658][10304.329575497]
2025-12-04T11:44:32.7219904Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:44:32.7222104Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_spawn_nccl.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:32.722004]
2025-12-04T11:46:03.5481274Z 
2025-12-04T11:46:03.5482461Z distributed/test_c10d_spawn_nccl 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_spawn_nccl_1.1_1bf221cec02d55ca_.log
2025-12-04T11:46:03.5488670Z Running 10 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_reduce_non_contiguous, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all_single, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_allreduce, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_broadcast, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter_non_contiguous
2025-12-04T11:46:03.5494046Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather
2025-12-04T11:46:03.5495145Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base
2025-12-04T11:46:03.5496339Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_reduce_non_contiguous
2025-12-04T11:46:03.5497846Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all
2025-12-04T11:46:03.5499011Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all_single
2025-12-04T11:46:03.5500139Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_allreduce
2025-12-04T11:46:03.5501242Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_broadcast
2025-12-04T11:46:03.5502330Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce
2025-12-04T11:46:03.5503431Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter
2025-12-04T11:46:03.5504662Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter_non_contiguous
2025-12-04T11:46:03.5505403Z 
2025-12-04T11:46:03.5505795Z Finished distributed/test_c10d_spawn_nccl 1/1 ... [2025-12-04 11:46:03.547749][10395.155665929], took 1.51min
2025-12-04T11:46:03.5777297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.xml
2025-12-04T11:46:03.6602735Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.xml
2025-12-04T11:46:03.6881287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.xml
2025-12-04T11:46:03.7251079Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.xml
2025-12-04T11:46:03.7583284Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.xml
2025-12-04T11:46:03.7859693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.xml
2025-12-04T11:46:03.8189611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.xml
2025-12-04T11:46:03.8439672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.xml
2025-12-04T11:46:03.8773214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.xml
2025-12-04T11:46:03.9077832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.xml
2025-12-04T11:46:03.9768869Z Running distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 11:46:03.976294][10395.584211275]
2025-12-04T11:46:03.9769499Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:46:03.9770948Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_spawn_ucc.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:03.976640]
2025-12-04T11:46:27.5323971Z 
2025-12-04T11:46:27.5325051Z distributed/test_c10d_spawn_ucc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_spawn_ucc_1.1_5521268884e60126_.log
2025-12-04T11:46:27.5328753Z Running 6 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce
2025-12-04T11:46:27.5332173Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather
2025-12-04T11:46:27.5333259Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all
2025-12-04T11:46:27.5334380Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single
2025-12-04T11:46:27.5335479Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce
2025-12-04T11:46:27.5336986Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast
2025-12-04T11:46:27.5338075Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce
2025-12-04T11:46:27.5338671Z 
2025-12-04T11:46:27.5339065Z Finished distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 11:46:27.532007][10419.139924091], took 0.39min
2025-12-04T11:46:27.5623900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.xml
2025-12-04T11:46:27.6584472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.xml
2025-12-04T11:46:27.6922138Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.xml
2025-12-04T11:46:27.7332807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.xml
2025-12-04T11:46:27.7637922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.xml
2025-12-04T11:46:27.7979750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.xml
2025-12-04T11:46:27.8726936Z Running distributed/test_c10d_gloo 1/2 ... [2025-12-04 11:46:27.872488][10419.480405833]
2025-12-04T11:46:27.8727539Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:46:27.8729955Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '--shard-id=1', '--num-shards=2', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:27.872806]
2025-12-04T12:03:45.7298560Z 
2025-12-04T12:03:45.7299957Z distributed/test_c10d_gloo 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_gloo_1.2_d5d0e2b1d744a982_.log
2025-12-04T12:03:45.7360290Z Running 127 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init, test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward, test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda, test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group, test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg
2025-12-04T12:03:45.7418140Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init
2025-12-04T12:03:45.7419107Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo
2025-12-04T12:03:45.7420167Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async
2025-12-04T12:03:45.7421521Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks
2025-12-04T12:03:45.7422656Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced
2025-12-04T12:03:45.7423739Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress
2025-12-04T12:03:45.7424767Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda
2025-12-04T12:03:45.7425794Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics
2025-12-04T12:03:45.7426807Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda
2025-12-04T12:03:45.7427835Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks
2025-12-04T12:03:45.7428886Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics
2025-12-04T12:03:45.7430023Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda
2025-12-04T12:03:45.7431141Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout
2025-12-04T12:03:45.7432191Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress
2025-12-04T12:03:45.7433372Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait
2025-12-04T12:03:45.7434282Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics
2025-12-04T12:03:45.7435277Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda
2025-12-04T12:03:45.7436196Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress
2025-12-04T12:03:45.7437106Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda
2025-12-04T12:03:45.7438008Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors
2025-12-04T12:03:45.7438885Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda
2025-12-04T12:03:45.7439997Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor
2025-12-04T12:03:45.7440964Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics
2025-12-04T12:03:45.7441891Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress
2025-12-04T12:03:45.7442816Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda
2025-12-04T12:03:45.7443812Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics
2025-12-04T12:03:45.7444758Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda
2025-12-04T12:03:45.7445708Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress
2025-12-04T12:03:45.7446643Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all
2025-12-04T12:03:45.7447625Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout
2025-12-04T12:03:45.7448615Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics
2025-12-04T12:03:45.7449648Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda
2025-12-04T12:03:45.7450705Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output
2025-12-04T12:03:45.7451809Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module
2025-12-04T12:03:45.7453112Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing
2025-12-04T12:03:45.7454326Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False
2025-12-04T12:03:45.7455538Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True
2025-12-04T12:03:45.7457055Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True
2025-12-04T12:03:45.7458499Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False
2025-12-04T12:03:45.7459881Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True
2025-12-04T12:03:45.7461290Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True
2025-12-04T12:03:45.7462724Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True
2025-12-04T12:03:45.7464121Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu
2025-12-04T12:03:45.7465450Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo
2025-12-04T12:03:45.7466729Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients
2025-12-04T12:03:45.7467969Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad
2025-12-04T12:03:45.7469391Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view
2025-12-04T12:03:45.7470644Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module
2025-12-04T12:03:45.7471842Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view
2025-12-04T12:03:45.7472984Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output
2025-12-04T12:03:45.7474093Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor
2025-12-04T12:03:45.7475175Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients
2025-12-04T12:03:45.7476299Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view
2025-12-04T12:03:45.7477472Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input
2025-12-04T12:03:45.7478511Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward
2025-12-04T12:03:45.7479443Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket
2025-12-04T12:03:45.7480420Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket
2025-12-04T12:03:45.7481652Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks
2025-12-04T12:03:45.7482678Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks
2025-12-04T12:03:45.7483749Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode
2025-12-04T12:03:45.7484834Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced
2025-12-04T12:03:45.7485865Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress
2025-12-04T12:03:45.7486864Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda
2025-12-04T12:03:45.7487855Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks
2025-12-04T12:03:45.7488872Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async
2025-12-04T12:03:45.7489932Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks
2025-12-04T12:03:45.7491029Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda
2025-12-04T12:03:45.7492127Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress
2025-12-04T12:03:45.7493166Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout
2025-12-04T12:03:45.7494241Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout
2025-12-04T12:03:45.7495266Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress
2025-12-04T12:03:45.7496267Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda
2025-12-04T12:03:45.7497629Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda
2025-12-04T12:03:45.7498757Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics
2025-12-04T12:03:45.7499846Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress
2025-12-04T12:03:45.7500967Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda
2025-12-04T12:03:45.7502078Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors
2025-12-04T12:03:45.7503167Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics
2025-12-04T12:03:45.7504228Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks
2025-12-04T12:03:45.7505361Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input
2025-12-04T12:03:45.7506527Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda
2025-12-04T12:03:45.7507641Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor
2025-12-04T12:03:45.7508956Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced
2025-12-04T12:03:45.7510108Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress
2025-12-04T12:03:45.7511091Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics
2025-12-04T12:03:45.7512041Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks
2025-12-04T12:03:45.7513019Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all
2025-12-04T12:03:45.7514016Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout
2025-12-04T12:03:45.7515042Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics
2025-12-04T12:03:45.7516011Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics
2025-12-04T12:03:45.7516948Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda
2025-12-04T12:03:45.7517929Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async
2025-12-04T12:03:45.7518921Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks
2025-12-04T12:03:45.7519902Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode
2025-12-04T12:03:45.7520992Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress
2025-12-04T12:03:45.7522191Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda
2025-12-04T12:03:45.7523373Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda
2025-12-04T12:03:45.7524463Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress
2025-12-04T12:03:45.7525557Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout
2025-12-04T12:03:45.7526646Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout
2025-12-04T12:03:45.7527719Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress
2025-12-04T12:03:45.7528775Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda
2025-12-04T12:03:45.7529850Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress
2025-12-04T12:03:45.7530911Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda
2025-12-04T12:03:45.7531965Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda
2025-12-04T12:03:45.7533014Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks
2025-12-04T12:03:45.7534085Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda
2025-12-04T12:03:45.7534996Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks
2025-12-04T12:03:45.7535882Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter
2025-12-04T12:03:45.7537083Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced
2025-12-04T12:03:45.7538214Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda
2025-12-04T12:03:45.7539267Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda
2025-12-04T12:03:45.7540371Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all
2025-12-04T12:03:45.7541405Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex
2025-12-04T12:03:45.7542446Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout
2025-12-04T12:03:45.7543452Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json
2025-12-04T12:03:45.7544514Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda
2025-12-04T12:03:45.7545636Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks
2025-12-04T12:03:45.7546679Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu
2025-12-04T12:03:45.7547681Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda
2025-12-04T12:03:45.7548776Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership
2025-12-04T12:03:45.7549764Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo
2025-12-04T12:03:45.7550674Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group
2025-12-04T12:03:45.7551547Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex
2025-12-04T12:03:45.7552627Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced
2025-12-04T12:03:45.7553952Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends
2025-12-04T12:03:45.7555144Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg
2025-12-04T12:03:45.7555701Z 
2025-12-04T12:03:45.7556021Z Finished distributed/test_c10d_gloo 1/2 ... [2025-12-04 12:03:45.732891][11457.340804599], took 17.30min
2025-12-04T12:03:45.7651137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.xml
2025-12-04T12:03:45.8517106Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.xml
2025-12-04T12:03:45.8847153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.xml
2025-12-04T12:03:45.9138685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.xml
2025-12-04T12:03:45.9434528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.xml
2025-12-04T12:03:45.9737274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.xml
2025-12-04T12:03:46.0177834Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.xml
2025-12-04T12:03:46.0542866Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.xml
2025-12-04T12:03:46.1335597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.xml
2025-12-04T12:03:46.1667362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.xml
2025-12-04T12:03:46.1976109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.xml
2025-12-04T12:03:46.2363983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.xml
2025-12-04T12:03:46.2700167Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.xml
2025-12-04T12:03:46.3393837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.xml
2025-12-04T12:03:46.3745257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.xml
2025-12-04T12:03:46.4157594Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.xml
2025-12-04T12:03:46.4515812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.xml
2025-12-04T12:03:46.4817096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.xml
2025-12-04T12:03:46.5168477Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.xml
2025-12-04T12:03:46.5476040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.xml
2025-12-04T12:03:46.5796862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.xml
2025-12-04T12:03:46.6146863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.xml
2025-12-04T12:03:46.6475309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.xml
2025-12-04T12:03:46.6794948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.xml
2025-12-04T12:03:46.7146492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.xml
2025-12-04T12:03:46.7497986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.xml
2025-12-04T12:03:46.7787204Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.xml
2025-12-04T12:03:46.8116168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.xml
2025-12-04T12:03:46.8456755Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.xml
2025-12-04T12:03:46.8847128Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.xml
2025-12-04T12:03:46.9169088Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.xml
2025-12-04T12:03:46.9538634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.xml
2025-12-04T12:03:46.9821386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.xml
2025-12-04T12:03:47.0162806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.xml
2025-12-04T12:03:47.0477211Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.xml
2025-12-04T12:03:47.0795821Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.xml
2025-12-04T12:03:47.1123280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.xml
2025-12-04T12:03:47.1425057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.xml
2025-12-04T12:03:47.1749805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.xml
2025-12-04T12:03:47.2089061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.xml
2025-12-04T12:03:47.2415208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.xml
2025-12-04T12:03:47.2724155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.xml
2025-12-04T12:03:47.3037066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.xml
2025-12-04T12:03:47.3338021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.xml
2025-12-04T12:03:47.3624999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.xml
2025-12-04T12:03:47.3924171Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.xml
2025-12-04T12:03:47.4254766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.xml
2025-12-04T12:03:47.4565730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.xml
2025-12-04T12:03:47.4886782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.xml
2025-12-04T12:03:47.5219996Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.xml
2025-12-04T12:03:47.5543189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.xml
2025-12-04T12:03:47.5938694Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.xml
2025-12-04T12:03:47.6269277Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.xml
2025-12-04T12:03:47.6538596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.xml
2025-12-04T12:03:47.6856341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.xml
2025-12-04T12:03:47.7138780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.xml
2025-12-04T12:03:47.7437334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.xml
2025-12-04T12:03:47.7779214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.xml
2025-12-04T12:03:47.8089863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.xml
2025-12-04T12:03:47.8375499Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.xml
2025-12-04T12:03:47.8676875Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.xml
2025-12-04T12:03:47.9018276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.xml
2025-12-04T12:03:47.9313655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.xml
2025-12-04T12:03:47.9650377Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.xml
2025-12-04T12:03:47.9965912Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.xml
2025-12-04T12:03:48.0338836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.xml
2025-12-04T12:03:48.0607900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.xml
2025-12-04T12:03:48.0938840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.xml
2025-12-04T12:03:48.1275456Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.xml
2025-12-04T12:03:48.1581243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.xml
2025-12-04T12:03:48.1907586Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.xml
2025-12-04T12:03:48.2244767Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.xml
2025-12-04T12:03:48.2558464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.xml
2025-12-04T12:03:48.2887921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.xml
2025-12-04T12:03:48.3166071Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.xml
2025-12-04T12:03:48.3508328Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.xml
2025-12-04T12:03:48.3899002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.xml
2025-12-04T12:03:48.4176851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.xml
2025-12-04T12:03:48.4540937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.xml
2025-12-04T12:03:48.4894757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.xml
2025-12-04T12:03:48.5222679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.xml
2025-12-04T12:03:48.5614685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.xml
2025-12-04T12:03:48.5937640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.xml
2025-12-04T12:03:48.6298983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.xml
2025-12-04T12:03:48.6597307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.xml
2025-12-04T12:03:48.7480230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.xml
2025-12-04T12:03:48.7894987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.xml
2025-12-04T12:03:48.8182589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.xml
2025-12-04T12:03:48.8489570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.xml
2025-12-04T12:03:48.8846090Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.xml
2025-12-04T12:03:48.9208830Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.xml
2025-12-04T12:03:48.9505653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.xml
2025-12-04T12:03:48.9860286Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.xml
2025-12-04T12:03:49.0178728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.xml
2025-12-04T12:03:49.0466096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.xml
2025-12-04T12:03:49.0849833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.xml
2025-12-04T12:03:49.1158760Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.xml
2025-12-04T12:03:49.1475655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.xml
2025-12-04T12:03:49.1765777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.xml
2025-12-04T12:03:49.2077001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.xml
2025-12-04T12:03:49.2378972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.xml
2025-12-04T12:03:49.3238505Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.xml
2025-12-04T12:03:49.3538233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.xml
2025-12-04T12:03:49.3843108Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.xml
2025-12-04T12:03:49.4166155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.xml
2025-12-04T12:03:49.4486236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.xml
2025-12-04T12:03:49.4796217Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.xml
2025-12-04T12:03:49.5096740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.xml
2025-12-04T12:03:49.5418730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.xml
2025-12-04T12:03:49.5757555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.xml
2025-12-04T12:03:49.6157520Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.xml
2025-12-04T12:03:49.6471275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.xml
2025-12-04T12:03:49.6785367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.xml
2025-12-04T12:03:49.7069485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.xml
2025-12-04T12:03:49.7379022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.xml
2025-12-04T12:03:49.7695565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.xml
2025-12-04T12:03:49.8019536Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.xml
2025-12-04T12:03:49.8302975Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.xml
2025-12-04T12:03:49.8618007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.xml
2025-12-04T12:03:49.8917062Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.xml
2025-12-04T12:03:49.9224500Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.xml
2025-12-04T12:03:49.9516655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.xml
2025-12-04T12:03:49.9807734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.xml
2025-12-04T12:03:50.0138843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.xml
2025-12-04T12:03:50.0445582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.xml
2025-12-04T12:03:50.0723784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.xml
2025-12-04T12:03:50.1037366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.xml
2025-12-04T12:03:50.6885198Z Uploading artifacts took 0.51 seconds
2025-12-04T12:03:50.6887332Z Running distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 12:03:50.688442][11462.296357453]
2025-12-04T12:03:50.6888116Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:03:50.6889789Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:03:50.688788]
2025-12-04T12:13:17.7419520Z 
2025-12-04T12:13:17.7421157Z distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_1.1_24bd8bcdd0ba69c1_.log
2025-12-04T12:13:17.7467835Z Running 74 items in this shard: test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorMetadata::test_serialize_and_deserialize, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorFromParams::test_empty, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_with_empty_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_collect_local_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_reshard_output, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor_error, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_cleanup, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_complete_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_like, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_full, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_rand, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_insufficient_sharding_dims, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_pg_rpc_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_load_state_dict_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_sizes, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharding_columns, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_no_sharded_tensors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_grid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_device, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cpu, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cuda, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_test, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_uneven_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_with_rpc_names, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_invalid_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_all_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_local_view, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_pin_memory, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_property_cross_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_gaps, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_overlap, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_with_different_glb_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_non_rw_sharded_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_st_base_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_override, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_create_shard_with_no_placement, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_shard_metadata_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_placement_validation, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_sharded_tensor_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_non_contiguous_local_shards
2025-12-04T12:13:17.7513214Z 
2025-12-04T12:13:17.7513694Z Finished distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 12:13:17.743251][12029.351163338], took 9.45min
2025-12-04T12:13:17.7883215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.xml
2025-12-04T12:13:17.9119303Z Running distributed/test_c10d_nccl 3/3 ... [2025-12-04 12:13:17.911328][12029.51924529]
2025-12-04T12:13:17.9119875Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:13:17.9121556Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_nccl.py', '--shard-id=3', '--num-shards=3', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:17.911668]
2025-12-04T12:24:58.9905622Z 
2025-12-04T12:24:58.9907019Z distributed/test_c10d_nccl 3/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_nccl_3.3_41c01794b25a1cc6_.log
2025-12-04T12:24:58.9944605Z Running 72 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor, test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration, test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch, test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check, test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True
2025-12-04T12:24:58.9980482Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str
2025-12-04T12:24:58.9981657Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs
2025-12-04T12:24:58.9982843Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup
2025-12-04T12:24:58.9984014Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend
2025-12-04T12:24:58.9985159Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context
2025-12-04T12:24:58.9986318Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops
2025-12-04T12:24:58.9987427Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid
2025-12-04T12:24:58.9988448Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32
2025-12-04T12:24:58.9989623Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error
2025-12-04T12:24:58.9990775Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init
2025-12-04T12:24:58.9991835Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg
2025-12-04T12:24:58.9992920Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl
2025-12-04T12:24:58.9994031Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags
2025-12-04T12:24:58.9995109Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config
2025-12-04T12:24:58.9996260Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance
2025-12-04T12:24:58.9997465Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance
2025-12-04T12:24:58.9998685Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False
2025-12-04T12:24:58.9999933Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view
2025-12-04T12:24:59.0001183Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl
2025-12-04T12:24:59.0002307Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig
2025-12-04T12:24:59.0003458Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param
2025-12-04T12:24:59.0004715Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True
2025-12-04T12:24:59.0006096Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False
2025-12-04T12:24:59.0007540Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True
2025-12-04T12:24:59.0008904Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True
2025-12-04T12:24:59.0010215Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing
2025-12-04T12:24:59.0011553Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True
2025-12-04T12:24:59.0012946Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True
2025-12-04T12:24:59.0014255Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl
2025-12-04T12:24:59.0015443Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads
2025-12-04T12:24:59.0016648Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence
2025-12-04T12:24:59.0017976Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery
2025-12-04T12:24:59.0019238Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info
2025-12-04T12:24:59.0020562Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view
2025-12-04T12:24:59.0022062Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view
2025-12-04T12:24:59.0023444Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed
2025-12-04T12:24:59.0024722Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input
2025-12-04T12:24:59.0025950Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input
2025-12-04T12:24:59.0027158Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier
2025-12-04T12:24:59.0028344Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking
2025-12-04T12:24:59.0029422Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor
2025-12-04T12:24:59.0030594Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration
2025-12-04T12:24:59.0031705Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce
2025-12-04T12:24:59.0032816Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail
2025-12-04T12:24:59.0033853Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info
2025-12-04T12:24:59.0034790Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl
2025-12-04T12:24:59.0035661Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex
2025-12-04T12:24:59.0036485Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch
2025-12-04T12:24:59.0037307Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl
2025-12-04T12:24:59.0038300Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single
2025-12-04T12:24:59.0039565Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn
2025-12-04T12:24:59.0040844Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced
2025-12-04T12:24:59.0042148Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends
2025-12-04T12:24:59.0043427Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False
2025-12-04T12:24:59.0044527Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True
2025-12-04T12:24:59.0045520Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg
2025-12-04T12:24:59.0046488Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check
2025-12-04T12:24:59.0047526Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True
2025-12-04T12:24:59.0048675Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False
2025-12-04T12:24:59.0049777Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True
2025-12-04T12:24:59.0050890Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True
2025-12-04T12:24:59.0051854Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling
2025-12-04T12:24:59.0052931Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True
2025-12-04T12:24:59.0054162Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False
2025-12-04T12:24:59.0055348Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True
2025-12-04T12:24:59.0056573Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True
2025-12-04T12:24:59.0057900Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False
2025-12-04T12:24:59.0059191Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False
2025-12-04T12:24:59.0060512Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True
2025-12-04T12:24:59.0061816Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True
2025-12-04T12:24:59.0063041Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False
2025-12-04T12:24:59.0064205Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True
2025-12-04T12:24:59.0064834Z 
2025-12-04T12:24:59.0065206Z Finished distributed/test_c10d_nccl 3/3 ... [2025-12-04 12:24:58.991818][12730.599734204], took 11.68min
2025-12-04T12:24:59.0374462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.xml
2025-12-04T12:24:59.1208017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.xml
2025-12-04T12:24:59.1549683Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.xml
2025-12-04T12:24:59.1918954Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.xml
2025-12-04T12:24:59.2253147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.xml
2025-12-04T12:24:59.2546588Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.xml
2025-12-04T12:24:59.2849425Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.xml
2025-12-04T12:24:59.3174216Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.xml
2025-12-04T12:24:59.3485265Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.xml
2025-12-04T12:24:59.3816129Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.xml
2025-12-04T12:24:59.4127239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.xml
2025-12-04T12:24:59.4477674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.xml
2025-12-04T12:24:59.4741692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.xml
2025-12-04T12:24:59.5055831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.xml
2025-12-04T12:24:59.5410094Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.xml
2025-12-04T12:24:59.5735906Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.xml
2025-12-04T12:24:59.6046974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.xml
2025-12-04T12:24:59.6539816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.xml
2025-12-04T12:24:59.6824432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.xml
2025-12-04T12:24:59.7142589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.xml
2025-12-04T12:24:59.7495275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.xml
2025-12-04T12:24:59.7835568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.xml
2025-12-04T12:24:59.8196763Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.xml
2025-12-04T12:24:59.8524451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.xml
2025-12-04T12:24:59.8977195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.xml
2025-12-04T12:24:59.9299529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.xml
2025-12-04T12:24:59.9646329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.xml
2025-12-04T12:24:59.9956648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.xml
2025-12-04T12:25:00.0278645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.xml
2025-12-04T12:25:00.0578581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.xml
2025-12-04T12:25:00.0923255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.xml
2025-12-04T12:25:00.1276141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.xml
2025-12-04T12:25:00.1565573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.xml
2025-12-04T12:25:00.2074938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.xml
2025-12-04T12:25:00.2407347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.xml
2025-12-04T12:25:00.2754616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.xml
2025-12-04T12:25:00.3058690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.xml
2025-12-04T12:25:00.3357162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.xml
2025-12-04T12:25:00.3714411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.xml
2025-12-04T12:25:00.4022641Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.xml
2025-12-04T12:25:00.4460983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.xml
2025-12-04T12:25:00.4820041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.xml
2025-12-04T12:25:00.5165930Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.xml
2025-12-04T12:25:00.5516629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.xml
2025-12-04T12:25:00.5820084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.xml
2025-12-04T12:25:00.6154665Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.xml
2025-12-04T12:25:00.6455314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.xml
2025-12-04T12:25:00.6797191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.xml
2025-12-04T12:25:00.7115367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.xml
2025-12-04T12:25:00.7477404Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.xml
2025-12-04T12:25:00.7955799Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.xml
2025-12-04T12:25:00.8273054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.xml
2025-12-04T12:25:00.8581012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.xml
2025-12-04T12:25:00.8900905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.xml
2025-12-04T12:25:00.9217067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.xml
2025-12-04T12:25:00.9564656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.xml
2025-12-04T12:25:00.9886131Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.xml
2025-12-04T12:25:01.0169713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.xml
2025-12-04T12:25:01.0496585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.xml
2025-12-04T12:25:01.0814165Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.xml
2025-12-04T12:25:01.1117985Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.xml
2025-12-04T12:25:01.1593562Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.xml
2025-12-04T12:25:01.1956773Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.xml
2025-12-04T12:25:01.2287189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.xml
2025-12-04T12:25:01.2604569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.xml
2025-12-04T12:25:01.2923631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.xml
2025-12-04T12:25:01.3275510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.xml
2025-12-04T12:25:01.3610929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.xml
2025-12-04T12:25:01.3937949Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.xml
2025-12-04T12:25:01.4258729Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.xml
2025-12-04T12:25:01.4588569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.xml
2025-12-04T12:25:01.4923372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.xml
2025-12-04T12:25:02.5381993Z Uploading artifacts took 0.97 seconds
2025-12-04T12:25:06.6694900Z Running test batch 'tests to run' cost 11917.7 seconds
2025-12-04T12:25:06.6701052Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.6704361Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44734394d10c11f08e600242ac110002
2025-12-04T12:25:06.7645302Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44734394d10c11f08e600242ac110002 
2025-12-04T12:25:06.7653655Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.7654989Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4481b262d10c11f08e600242ac110002
2025-12-04T12:25:06.8042925Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4481b262d10c11f08e600242ac110002 
2025-12-04T12:25:06.8049373Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.8050279Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4487bca2d10c11f08e600242ac110002
2025-12-04T12:25:06.8365591Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4487bca2d10c11f08e600242ac110002 
2025-12-04T12:25:06.8371485Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.8372352Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_448caaa0d10c11f08e600242ac110002
2025-12-04T12:25:06.8713486Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_448caaa0d10c11f08e600242ac110002 
2025-12-04T12:25:06.8717810Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.8718765Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4491f8c0d10c11f08e600242ac110002
2025-12-04T12:25:06.9043351Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4491f8c0d10c11f08e600242ac110002 
2025-12-04T12:25:06.9047753Z Emitting td_test_failure_stats_v2
2025-12-04T12:25:06.9048566Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44970374d10c11f08e600242ac110002
2025-12-04T12:25:06.9323463Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44970374d10c11f08e600242ac110002 
2025-12-04T12:25:06.9324509Z distributed/fsdp/test_fsdp_overlap 1/1 failed!
2025-12-04T12:25:06.9324967Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed!
2025-12-04T12:25:06.9325420Z distributed/fsdp/test_fsdp_exec_order 1/1 failed!
2025-12-04T12:25:06.9326014Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed!
2025-12-04T12:25:06.9326614Z distributed/fsdp/test_fsdp_clip_grad_norm 1/1 failed!
2025-12-04T12:25:06.9327027Z distributed/fsdp/test_fsdp_core 2/2 failed!
2025-12-04T12:25:07.7495632Z 
2025-12-04T12:25:07.7496127Z real	198m44.462s
2025-12-04T12:25:07.7496656Z user	425m27.491s
2025-12-04T12:25:07.7497098Z sys	243m29.884s
2025-12-04T12:25:07.7497383Z + sccache_epilogue
2025-12-04T12:25:07.7497710Z + echo '::group::Sccache Compilation Log'
2025-12-04T12:25:07.7498407Z ##[group]Sccache Compilation Log
2025-12-04T12:25:07.7498826Z + echo '=================== sccache compilation log ==================='
2025-12-04T12:25:07.7499312Z =================== sccache compilation log ===================
2025-12-04T12:25:07.7500387Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T12:25:07.7628802Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T12:25:07.7629644Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T12:25:07.7630219Z + sccache --show-stats
2025-12-04T12:25:07.7656695Z Compile requests                    532
2025-12-04T12:25:07.7657192Z Compile requests executed            12
2025-12-04T12:25:07.7657743Z Cache hits                            6
2025-12-04T12:25:07.7658120Z Cache hits (C/C++)                    6
2025-12-04T12:25:07.7658459Z Cache misses                          6
2025-12-04T12:25:07.7658811Z Cache misses (C/C++)                  6
2025-12-04T12:25:07.7659173Z Cache hits rate                   50.00 %
2025-12-04T12:25:07.7659546Z Cache hits rate (C/C++)           50.00 %
2025-12-04T12:25:07.7659910Z Cache timeouts                        0
2025-12-04T12:25:07.7660274Z Cache read errors                     0
2025-12-04T12:25:07.7660633Z Forced recaches                       0
2025-12-04T12:25:07.7660973Z Cache write errors                    0
2025-12-04T12:25:07.7661497Z Cache errors                          0
2025-12-04T12:25:07.7661852Z Compilations                          6
2025-12-04T12:25:07.7662215Z Compilation failures                  0
2025-12-04T12:25:07.7662591Z Non-cacheable compilations            0
2025-12-04T12:25:07.7662951Z Non-cacheable calls                  13
2025-12-04T12:25:07.7663318Z Non-compilation calls               507
2025-12-04T12:25:07.7663690Z Unsupported compiler calls            0
2025-12-04T12:25:07.7664068Z Average cache write               0.046 s
2025-12-04T12:25:07.7664434Z Average compiler                  3.827 s
2025-12-04T12:25:07.7664809Z Average cache read hit            0.020 s
2025-12-04T12:25:07.7665197Z Failed distributed compilations       0
2025-12-04T12:25:07.7665452Z 
2025-12-04T12:25:07.7665566Z Non-cacheable reasons:
2025-12-04T12:25:07.7665874Z -E                                    7
2025-12-04T12:25:07.7666246Z unknown source language               6
2025-12-04T12:25:07.7666486Z 
2025-12-04T12:25:07.7666826Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T12:25:07.7667362Z Version (client)                0.10.0
2025-12-04T12:25:07.7667725Z + sccache --stop-server
2025-12-04T12:25:07.7681902Z Stopping sccache server...
2025-12-04T12:25:07.7682343Z Compile requests                    532
2025-12-04T12:25:07.7682725Z Compile requests executed            12
2025-12-04T12:25:07.7683099Z Cache hits                            6
2025-12-04T12:25:07.7683448Z Cache hits (C/C++)                    6
2025-12-04T12:25:07.7683777Z Cache misses                          6
2025-12-04T12:25:07.7684122Z Cache misses (C/C++)                  6
2025-12-04T12:25:07.7684477Z Cache hits rate                   50.00 %
2025-12-04T12:25:07.7684837Z Cache hits rate (C/C++)           50.00 %
2025-12-04T12:25:07.7685197Z Cache timeouts                        0
2025-12-04T12:25:07.7685544Z Cache read errors                     0
2025-12-04T12:25:07.7685874Z Forced recaches                       0
2025-12-04T12:25:07.7686230Z Cache write errors                    0
2025-12-04T12:25:07.7686576Z Cache errors                          0
2025-12-04T12:25:07.7686908Z Compilations                          6
2025-12-04T12:25:07.7687263Z Compilation failures                  0
2025-12-04T12:25:07.7687627Z Non-cacheable compilations            0
2025-12-04T12:25:07.7687989Z Non-cacheable calls                  13
2025-12-04T12:25:07.7688330Z Non-compilation calls               507
2025-12-04T12:25:07.7688698Z Unsupported compiler calls            0
2025-12-04T12:25:07.7689068Z Average cache write               0.046 s
2025-12-04T12:25:07.7689425Z Average compiler                  3.827 s
2025-12-04T12:25:07.7689793Z Average cache read hit            0.020 s
2025-12-04T12:25:07.7690436Z Failed distributed compilations       0
2025-12-04T12:25:07.7690686Z 
2025-12-04T12:25:07.7690797Z Non-cacheable reasons:
2025-12-04T12:25:07.7691095Z -E                                    7
2025-12-04T12:25:07.7691451Z unknown source language               6
2025-12-04T12:25:07.7691687Z 
2025-12-04T12:25:07.7691961Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T12:25:07.7692464Z Version (client)                0.10.0
2025-12-04T12:25:07.7692825Z + echo ::endgroup::
2025-12-04T12:25:07.7693627Z ##[endgroup]
2025-12-04T12:25:07.7693888Z + cleanup_workspace
2025-12-04T12:25:07.7694723Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.'
2025-12-04T12:25:07.7695906Z sudo may print the following warning message that can be ignored. The chown command will still run.
2025-12-04T12:25:07.7696875Z + echo '    sudo: setrlimit(RLIMIT_STACK): Operation not permitted'
2025-12-04T12:25:07.7697486Z     sudo: setrlimit(RLIMIT_STACK): Operation not permitted
2025-12-04T12:25:07.7698278Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42'
2025-12-04T12:25:07.7699245Z For more details refer to https://github.com/sudo-project/sudo/issues/42
2025-12-04T12:25:07.7700100Z + sudo chown -R 1000 /var/lib/jenkins/workspace
2025-12-04T12:25:08.4372705Z ##[error]Process completed with exit code 1.
2025-12-04T12:25:08.4446240Z Prepare all required actions
2025-12-04T12:25:08.4446654Z Getting action download info
2025-12-04T12:25:08.6295324Z ##[group]Run ./.github/actions/pytest-cache-upload
2025-12-04T12:25:08.6296017Z with:
2025-12-04T12:25:08.6296255Z   cache_dir: .pytest_cache
2025-12-04T12:25:08.6296617Z   shard: 3
2025-12-04T12:25:08.6297052Z   sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T12:25:08.6297459Z   test_config: distributed
2025-12-04T12:25:08.6297858Z   job_identifier: trunk_linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T12:25:08.6298289Z env:
2025-12-04T12:25:08.6298521Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:08.6298824Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:08.6299190Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:08.6299821Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:08.6300406Z ##[endgroup]
2025-12-04T12:25:08.6336515Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T12:25:08.6337047Z with:
2025-12-04T12:25:08.6337269Z   shell: bash
2025-12-04T12:25:08.6337528Z   timeout_minutes: 5
2025-12-04T12:25:08.6337816Z   max_attempts: 5
2025-12-04T12:25:08.6338077Z   retry_wait_seconds: 30
2025-12-04T12:25:08.6338469Z   command: set -eu
python3 -m pip install boto3==1.35.42

2025-12-04T12:25:08.6338912Z   polling_interval_seconds: 1
2025-12-04T12:25:08.6339237Z   warning_on_retry: true
2025-12-04T12:25:08.6339524Z   continue_on_error: false
2025-12-04T12:25:08.6339812Z env:
2025-12-04T12:25:08.6340054Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:08.6340353Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:08.6340714Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:08.6341370Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:08.6341939Z ##[endgroup]
2025-12-04T12:25:08.9873936Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T12:25:10.1962623Z Collecting boto3==1.35.42
2025-12-04T12:25:10.2132753Z   Downloading boto3-1.35.42-py3-none-any.whl (139 kB)
2025-12-04T12:25:11.5088208Z Collecting botocore<1.36.0,>=1.35.42
2025-12-04T12:25:11.5131295Z   Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB)
2025-12-04T12:25:11.7100541Z Collecting s3transfer<0.11.0,>=0.10.0
2025-12-04T12:25:11.7140500Z   Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB)
2025-12-04T12:25:11.7185849Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0)
2025-12-04T12:25:11.7248279Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1)
2025-12-04T12:25:11.7255184Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10)
2025-12-04T12:25:11.8746471Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0)
2025-12-04T12:25:11.9714826Z Installing collected packages: botocore, s3transfer, boto3
2025-12-04T12:25:12.5525174Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4
2025-12-04T12:25:12.7182934Z Command completed after 1 attempt(s).
2025-12-04T12:25:12.7239372Z ##[group]Run python3 .github/scripts/pytest_cache.py \
2025-12-04T12:25:12.7239843Z [36;1mpython3 .github/scripts/pytest_cache.py \[0m
2025-12-04T12:25:12.7240204Z [36;1m  --upload \[0m
2025-12-04T12:25:12.7240531Z [36;1m  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \[0m
2025-12-04T12:25:12.7240947Z [36;1m  --pr_identifier "$GITHUB_REF" \[0m
2025-12-04T12:25:12.7241309Z [36;1m  --job_identifier "$JOB_IDENTIFIER" \[0m
2025-12-04T12:25:12.7241656Z [36;1m  --sha "$SHA" \[0m
2025-12-04T12:25:12.7242031Z [36;1m  --test_config "$TEST_CONFIG" \[0m
2025-12-04T12:25:12.7242366Z [36;1m  --shard "$SHARD" \[0m
2025-12-04T12:25:12.7242645Z [36;1m  --repo "$REPO" \[0m
2025-12-04T12:25:12.7243083Z [36;1m  --temp_dir "$RUNNER_TEMP" \[0m
2025-12-04T12:25:12.7252501Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:12.7252884Z env:
2025-12-04T12:25:12.7253108Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:12.7253388Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:12.7253702Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:12.7254284Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:12.7254819Z   CACHE_DIR: .pytest_cache
2025-12-04T12:25:12.7255183Z   JOB_IDENTIFIER: trunk_linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T12:25:12.7255597Z   SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T12:25:12.7255950Z   TEST_CONFIG: distributed
2025-12-04T12:25:12.7256218Z   SHARD: 3
2025-12-04T12:25:12.7256567Z   REPO: pytorch/pytorch
2025-12-04T12:25:12.7257007Z ##[endgroup]
2025-12-04T12:25:13.0829464Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013`
2025-12-04T12:25:13.0831950Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='trunk_linux-jammy-cuda12.8-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='distributed', shard='3', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None)
2025-12-04T12:25:13.0834206Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache
2025-12-04T12:25:13.0835657Z      to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3
2025-12-04T12:25:13.0837990Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3.zip
2025-12-04T12:25:13.0840039Z        to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3.zip
2025-12-04T12:25:13.1315360Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T12:25:13.1315774Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T12:25:13.1322286Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:13.1322715Z env:
2025-12-04T12:25:13.1322965Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:13.1323394Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:13.1323765Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:13.1324422Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:13.1325015Z ##[endgroup]
2025-12-04T12:25:13.1419016Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T12:25:13.1447643Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2025-12-04T12:25:13.1448034Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2025-12-04T12:25:13.1453718Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:13.1454093Z env:
2025-12-04T12:25:13.1454316Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:13.1454592Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:13.1454903Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:13.1455477Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:13.1456009Z   MONITOR_SCRIPT_PID: 62844
2025-12-04T12:25:13.1456299Z ##[endgroup]
2025-12-04T12:25:13.1480541Z /home/ec2-user/actions-runner/_work/_temp/15c09898-b395-4ad6-b513-93226678e011.sh: line 1: kill: (62844) - No such process
2025-12-04T12:25:13.1483199Z ##[error]Process completed with exit code 1.
2025-12-04T12:25:13.1621408Z Prepare all required actions
2025-12-04T12:25:13.1621898Z Getting action download info
2025-12-04T12:25:13.3454635Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T12:25:13.5737919Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T12:25:14.0760139Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T12:25:14.0760517Z with:
2025-12-04T12:25:14.0760933Z   file-suffix: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T12:25:14.0761443Z   s3-bucket: gha-artifacts
2025-12-04T12:25:14.0761709Z env:
2025-12-04T12:25:14.0761930Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.0762202Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.0762531Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.0763111Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.0763665Z ##[endgroup]
2025-12-04T12:25:14.0789934Z ##[group]Run # Remove any previous test jsons if they exist
2025-12-04T12:25:14.0790448Z [36;1m# Remove any previous test jsons if they exist[0m
2025-12-04T12:25:14.0790849Z [36;1mrm -f test-jsons-*.zip[0m
2025-12-04T12:25:14.0791328Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json'[0m
2025-12-04T12:25:14.0797244Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:14.0797631Z env:
2025-12-04T12:25:14.0797844Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.0798121Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.0798448Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.0799015Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.0799740Z   FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T12:25:14.0800241Z ##[endgroup]
2025-12-04T12:25:14.1038537Z   adding: test/test-reports/td_exclusions-2f1c2264a3249442bd0a.json (deflated 86%)
2025-12-04T12:25:14.1039727Z   adding: test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.json (deflated 92%)
2025-12-04T12:25:14.1041220Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.json (deflated 79%)
2025-12-04T12:25:14.1042646Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.json (deflated 79%)
2025-12-04T12:25:14.1044081Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.json (deflated 79%)
2025-12-04T12:25:14.1045651Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.json (stored 0%)
2025-12-04T12:25:14.1047079Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.json (deflated 79%)
2025-12-04T12:25:14.1048524Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.json (deflated 79%)
2025-12-04T12:25:14.1049962Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.json (deflated 79%)
2025-12-04T12:25:14.1051389Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.json (deflated 79%)
2025-12-04T12:25:14.1052819Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.json (deflated 87%)
2025-12-04T12:25:14.1054254Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.json (deflated 79%)
2025-12-04T12:25:14.1055884Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.json (stored 0%)
2025-12-04T12:25:14.1057682Z   adding: test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.json (deflated 90%)
2025-12-04T12:25:14.1059242Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.json (deflated 79%)
2025-12-04T12:25:14.1060757Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.json (deflated 79%)
2025-12-04T12:25:14.1062255Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.json (deflated 79%)
2025-12-04T12:25:14.1063757Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.json (deflated 79%)
2025-12-04T12:25:14.1065257Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.json (deflated 79%)
2025-12-04T12:25:14.1066762Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.json (deflated 79%)
2025-12-04T12:25:14.1068287Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.json (deflated 80%)
2025-12-04T12:25:14.1069864Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.json (deflated 88%)
2025-12-04T12:25:14.1071329Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.json (deflated 80%)
2025-12-04T12:25:14.1072787Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.json (deflated 80%)
2025-12-04T12:25:14.1074248Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.json (deflated 80%)
2025-12-04T12:25:14.1075704Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.json (deflated 91%)
2025-12-04T12:25:14.1077151Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.json (deflated 80%)
2025-12-04T12:25:14.1078620Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.json (deflated 80%)
2025-12-04T12:25:14.1080133Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.json (deflated 80%)
2025-12-04T12:25:14.1081597Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.json (deflated 80%)
2025-12-04T12:25:14.1083072Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.json (deflated 80%)
2025-12-04T12:25:14.1084526Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.json (deflated 91%)
2025-12-04T12:25:14.1085989Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.json (deflated 79%)
2025-12-04T12:25:14.1087452Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.json (deflated 79%)
2025-12-04T12:25:14.1088993Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.json (deflated 87%)
2025-12-04T12:25:14.1090476Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.json (deflated 79%)
2025-12-04T12:25:14.1091938Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.json (deflated 79%)
2025-12-04T12:25:14.1093402Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.json (deflated 79%)
2025-12-04T12:25:14.1094845Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.json (stored 0%)
2025-12-04T12:25:14.1096461Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.json (deflated 80%)
2025-12-04T12:25:14.1098279Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.json (deflated 80%)
2025-12-04T12:25:14.1099938Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.json (deflated 80%)
2025-12-04T12:25:14.1101611Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.json (deflated 80%)
2025-12-04T12:25:14.1103270Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.json (deflated 80%)
2025-12-04T12:25:14.1104926Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.json (deflated 80%)
2025-12-04T12:25:14.1106594Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.json (deflated 80%)
2025-12-04T12:25:14.1108266Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.json (deflated 80%)
2025-12-04T12:25:14.1109997Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.json (deflated 80%)
2025-12-04T12:25:14.1111631Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.json (deflated 80%)
2025-12-04T12:25:14.1113287Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.json (deflated 80%)
2025-12-04T12:25:14.1114885Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.json (deflated 80%)
2025-12-04T12:25:14.1116508Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.json (deflated 80%)
2025-12-04T12:25:14.1118120Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.json (deflated 80%)
2025-12-04T12:25:14.1119732Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.json (deflated 80%)
2025-12-04T12:25:14.1121716Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.json (deflated 88%)
2025-12-04T12:25:14.1123492Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.json (deflated 80%)
2025-12-04T12:25:14.1125221Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.json (deflated 80%)
2025-12-04T12:25:14.1126880Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.json (deflated 87%)
2025-12-04T12:25:14.1128533Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.json (deflated 79%)
2025-12-04T12:25:14.1130179Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.json (deflated 79%)
2025-12-04T12:25:14.1131849Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.json (deflated 79%)
2025-12-04T12:25:14.1133631Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.json (deflated 79%)
2025-12-04T12:25:14.1135241Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.json (deflated 79%)
2025-12-04T12:25:14.1137083Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.json (stored 0%)
2025-12-04T12:25:14.1138683Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.json (deflated 87%)
2025-12-04T12:25:14.1140268Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.json (deflated 87%)
2025-12-04T12:25:14.1141838Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.json (deflated 79%)
2025-12-04T12:25:14.1143407Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.json (deflated 79%)
2025-12-04T12:25:14.1144980Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.json (deflated 79%)
2025-12-04T12:25:14.1146537Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.json (deflated 79%)
2025-12-04T12:25:14.1148158Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.json (deflated 79%)
2025-12-04T12:25:14.1149813Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.json (deflated 87%)
2025-12-04T12:25:14.1151335Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.json (deflated 79%)
2025-12-04T12:25:14.1152846Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.json (deflated 90%)
2025-12-04T12:25:14.1154365Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.json (deflated 91%)
2025-12-04T12:25:14.1155899Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.json (deflated 90%)
2025-12-04T12:25:14.1157466Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.json (stored 0%)
2025-12-04T12:25:14.1158928Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.json (deflated 79%)
2025-12-04T12:25:14.1160279Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.json (deflated 87%)
2025-12-04T12:25:14.1161639Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.json (deflated 87%)
2025-12-04T12:25:14.1162997Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.json (deflated 79%)
2025-12-04T12:25:14.1164367Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.json (deflated 87%)
2025-12-04T12:25:14.1165720Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.json (deflated 79%)
2025-12-04T12:25:14.1167079Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.json (deflated 79%)
2025-12-04T12:25:14.1168440Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.json (deflated 79%)
2025-12-04T12:25:14.1169818Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.json (deflated 80%)
2025-12-04T12:25:14.1171177Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.json (deflated 79%)
2025-12-04T12:25:14.1172536Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.json (deflated 79%)
2025-12-04T12:25:14.1173902Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.json (deflated 79%)
2025-12-04T12:25:14.1175261Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.json (deflated 87%)
2025-12-04T12:25:14.1176688Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.json (deflated 79%)
2025-12-04T12:25:14.1178244Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.json (deflated 79%)
2025-12-04T12:25:14.1179650Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.json (deflated 87%)
2025-12-04T12:25:14.1181077Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.json (deflated 79%)
2025-12-04T12:25:14.1182474Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.json (deflated 79%)
2025-12-04T12:25:14.1183872Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.json (deflated 79%)
2025-12-04T12:25:14.1185273Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.json (deflated 79%)
2025-12-04T12:25:14.1186683Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.json (deflated 79%)
2025-12-04T12:25:14.1188088Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.json (deflated 87%)
2025-12-04T12:25:14.1189644Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.json (deflated 79%)
2025-12-04T12:25:14.1191021Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.json (deflated 91%)
2025-12-04T12:25:14.1192380Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.json (deflated 79%)
2025-12-04T12:25:14.1193739Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.json (deflated 87%)
2025-12-04T12:25:14.1195100Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.json (deflated 79%)
2025-12-04T12:25:14.1196439Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.json (deflated 87%)
2025-12-04T12:25:14.1197807Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.json (deflated 79%)
2025-12-04T12:25:14.1199179Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.json (deflated 91%)
2025-12-04T12:25:14.1200530Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.json (deflated 91%)
2025-12-04T12:25:14.1201886Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.json (deflated 79%)
2025-12-04T12:25:14.1203230Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.json (deflated 91%)
2025-12-04T12:25:14.1204588Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.json (deflated 79%)
2025-12-04T12:25:14.1205958Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.json (deflated 79%)
2025-12-04T12:25:14.1207325Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.json (deflated 79%)
2025-12-04T12:25:14.1208674Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.json (deflated 79%)
2025-12-04T12:25:14.1210036Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.json (deflated 79%)
2025-12-04T12:25:14.1211396Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.json (deflated 79%)
2025-12-04T12:25:14.1212755Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.json (deflated 79%)
2025-12-04T12:25:14.1214477Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.json (deflated 79%)
2025-12-04T12:25:14.1215847Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.json (deflated 87%)
2025-12-04T12:25:14.1217459Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.json (deflated 79%)
2025-12-04T12:25:14.1218856Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.json (deflated 79%)
2025-12-04T12:25:14.1220261Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.json (deflated 79%)
2025-12-04T12:25:14.1221819Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.json (deflated 79%)
2025-12-04T12:25:14.1223342Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.json (deflated 79%)
2025-12-04T12:25:14.1224799Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.json (deflated 79%)
2025-12-04T12:25:14.1226193Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.json (deflated 80%)
2025-12-04T12:25:14.1227587Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.json (deflated 80%)
2025-12-04T12:25:14.1228990Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.json (deflated 91%)
2025-12-04T12:25:14.1230386Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.json (deflated 80%)
2025-12-04T12:25:14.1231789Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.json (deflated 80%)
2025-12-04T12:25:14.1233283Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.json (deflated 91%)
2025-12-04T12:25:14.1234636Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.json (deflated 88%)
2025-12-04T12:25:14.1235991Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.json (deflated 91%)
2025-12-04T12:25:14.1237341Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.json (deflated 80%)
2025-12-04T12:25:14.1238684Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.json (deflated 79%)
2025-12-04T12:25:14.1240032Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.json (deflated 79%)
2025-12-04T12:25:14.1241395Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.json (deflated 79%)
2025-12-04T12:25:14.1242757Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.json (deflated 87%)
2025-12-04T12:25:14.1244121Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.json (deflated 88%)
2025-12-04T12:25:14.1245471Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.json (deflated 79%)
2025-12-04T12:25:14.1246874Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.json (deflated 87%)
2025-12-04T12:25:14.1248232Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.json (deflated 79%)
2025-12-04T12:25:14.1249591Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.json (deflated 87%)
2025-12-04T12:25:14.1250935Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.json (deflated 79%)
2025-12-04T12:25:14.1252290Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.json (deflated 87%)
2025-12-04T12:25:14.1253643Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.json (deflated 79%)
2025-12-04T12:25:14.1254994Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.json (deflated 79%)
2025-12-04T12:25:14.1256502Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.json (deflated 79%)
2025-12-04T12:25:14.1258218Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.json (deflated 79%)
2025-12-04T12:25:14.1259617Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.json (deflated 79%)
2025-12-04T12:25:14.1261005Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.json (deflated 79%)
2025-12-04T12:25:14.1262409Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.json (deflated 79%)
2025-12-04T12:25:14.1263813Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.json (deflated 79%)
2025-12-04T12:25:14.1265227Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.json (deflated 79%)
2025-12-04T12:25:14.1266612Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.json (deflated 79%)
2025-12-04T12:25:14.1268008Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.json (deflated 79%)
2025-12-04T12:25:14.1269501Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.json (deflated 87%)
2025-12-04T12:25:14.1270856Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.json (deflated 79%)
2025-12-04T12:25:14.1272208Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.json (stored 0%)
2025-12-04T12:25:14.1273580Z   adding: test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.json (deflated 89%)
2025-12-04T12:25:14.1275162Z   adding: test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.json (deflated 94%)
2025-12-04T12:25:14.1276802Z   adding: test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.json (deflated 87%)
2025-12-04T12:25:14.1278253Z   adding: test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.json (deflated 47%)
2025-12-04T12:25:14.1279580Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.json (deflated 35%)
2025-12-04T12:25:14.1280948Z   adding: test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.json (deflated 76%)
2025-12-04T12:25:14.1282387Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.json (deflated 83%)
2025-12-04T12:25:14.1283933Z   adding: test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.json (deflated 68%)
2025-12-04T12:25:14.1285476Z   adding: test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.json (deflated 82%)
2025-12-04T12:25:14.1287116Z   adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.json (deflated 42%)
2025-12-04T12:25:14.1288985Z   adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.json (deflated 78%)
2025-12-04T12:25:14.1290835Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.json (deflated 74%)
2025-12-04T12:25:14.1292476Z   adding: test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.json (deflated 91%)
2025-12-04T12:25:14.1294032Z   adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.json (deflated 92%)
2025-12-04T12:25:14.1295569Z   adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.json (deflated 90%)
2025-12-04T12:25:14.1297255Z   adding: test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.json (deflated 90%)
2025-12-04T12:25:14.1298696Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.json (deflated 69%)
2025-12-04T12:25:14.1300380Z   adding: test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.json (deflated 76%)
2025-12-04T12:25:14.1302070Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.json (deflated 87%)
2025-12-04T12:25:14.1303583Z   adding: test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.json (deflated 92%)
2025-12-04T12:25:14.1305224Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.json (stored 0%)
2025-12-04T12:25:14.1306839Z   adding: test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.json (deflated 63%)
2025-12-04T12:25:14.1308420Z   adding: test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.json (deflated 86%)
2025-12-04T12:25:14.1310110Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.json (deflated 64%)
2025-12-04T12:25:14.1311635Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.json (deflated 66%)
2025-12-04T12:25:14.1313310Z   adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.json (deflated 92%)
2025-12-04T12:25:14.1315018Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.json (deflated 89%)
2025-12-04T12:25:14.1316601Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.json (deflated 66%)
2025-12-04T12:25:14.1318047Z   adding: test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.json (deflated 91%)
2025-12-04T12:25:14.1319608Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.json (deflated 80%)
2025-12-04T12:25:14.1321446Z   adding: test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.json (deflated 73%)
2025-12-04T12:25:14.1322948Z   adding: test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.json (deflated 85%)
2025-12-04T12:25:14.1324359Z   adding: test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.json (deflated 91%)
2025-12-04T12:25:14.1325944Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.json (deflated 67%)
2025-12-04T12:25:14.1327541Z   adding: test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.json (deflated 88%)
2025-12-04T12:25:14.1329237Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.json (deflated 67%)
2025-12-04T12:25:14.1331004Z   adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.json (deflated 73%)
2025-12-04T12:25:14.1332419Z   adding: test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.json (deflated 93%)
2025-12-04T12:25:14.1334097Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.json (deflated 41%)
2025-12-04T12:25:14.1335708Z   adding: test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.json (deflated 92%)
2025-12-04T12:25:14.1337444Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.json (deflated 42%)
2025-12-04T12:25:14.1339081Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.json (deflated 87%)
2025-12-04T12:25:14.1340608Z   adding: test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.json (deflated 38%)
2025-12-04T12:25:14.1342080Z   adding: test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.json (deflated 40%)
2025-12-04T12:25:14.1343596Z   adding: test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.json (deflated 40%)
2025-12-04T12:25:14.1345062Z   adding: test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.json (deflated 91%)
2025-12-04T12:25:14.1346538Z   adding: test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.json (deflated 39%)
2025-12-04T12:25:14.1348053Z   adding: test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.json (deflated 40%)
2025-12-04T12:25:14.1349703Z   adding: test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.json (deflated 92%)
2025-12-04T12:25:14.1351333Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.json (deflated 81%)
2025-12-04T12:25:14.1353041Z   adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.json (deflated 94%)
2025-12-04T12:25:14.1354650Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.json (deflated 97%)
2025-12-04T12:25:14.1356058Z   adding: test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.json (deflated 38%)
2025-12-04T12:25:14.1357442Z   adding: test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.json (deflated 82%)
2025-12-04T12:25:14.1358904Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.json (deflated 88%)
2025-12-04T12:25:14.1360604Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.json (deflated 90%)
2025-12-04T12:25:14.1362307Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.json (deflated 94%)
2025-12-04T12:25:14.1363979Z   adding: test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.json (deflated 76%)
2025-12-04T12:25:14.1365716Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.json (deflated 67%)
2025-12-04T12:25:14.1367481Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.json (deflated 87%)
2025-12-04T12:25:14.1369064Z   adding: test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.json (deflated 90%)
2025-12-04T12:25:14.1370621Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.json (deflated 67%)
2025-12-04T12:25:14.1372273Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.json (deflated 92%)
2025-12-04T12:25:14.1373839Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.json (deflated 86%)
2025-12-04T12:25:14.1375541Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.json (deflated 68%)
2025-12-04T12:25:14.1377646Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.json (deflated 71%)
2025-12-04T12:25:14.1379343Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.json (deflated 36%)
2025-12-04T12:25:14.1380752Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.json (deflated 36%)
2025-12-04T12:25:14.1382142Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.json (deflated 36%)
2025-12-04T12:25:14.1383550Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.json (deflated 36%)
2025-12-04T12:25:14.1384994Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.json (deflated 37%)
2025-12-04T12:25:14.1386397Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.json (deflated 36%)
2025-12-04T12:25:14.1387790Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.json (deflated 36%)
2025-12-04T12:25:14.1389293Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.json (deflated 36%)
2025-12-04T12:25:14.1390653Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.json (deflated 37%)
2025-12-04T12:25:14.1392008Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.json (deflated 37%)
2025-12-04T12:25:14.1393444Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.json (deflated 46%)
2025-12-04T12:25:14.1394807Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.json (deflated 46%)
2025-12-04T12:25:14.1396148Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.json (deflated 46%)
2025-12-04T12:25:14.1397483Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.json (deflated 46%)
2025-12-04T12:25:14.1398824Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.json (deflated 46%)
2025-12-04T12:25:14.1400162Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.json (deflated 46%)
2025-12-04T12:25:14.1401465Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.json (deflated 33%)
2025-12-04T12:25:14.1402725Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.json (deflated 36%)
2025-12-04T12:25:14.1403977Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.json (deflated 33%)
2025-12-04T12:25:14.1405213Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.json (deflated 33%)
2025-12-04T12:25:14.1406470Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.json (deflated 34%)
2025-12-04T12:25:14.1407716Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.json (deflated 33%)
2025-12-04T12:25:14.1408974Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.json (deflated 34%)
2025-12-04T12:25:14.1410209Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.json (deflated 33%)
2025-12-04T12:25:14.1411458Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.json (deflated 33%)
2025-12-04T12:25:14.1412713Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.json (deflated 33%)
2025-12-04T12:25:14.1413963Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.json (deflated 34%)
2025-12-04T12:25:14.1415210Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.json (deflated 33%)
2025-12-04T12:25:14.1416542Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.json (deflated 33%)
2025-12-04T12:25:14.1418000Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.json (deflated 34%)
2025-12-04T12:25:14.1419292Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.json (deflated 34%)
2025-12-04T12:25:14.1420580Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.json (deflated 33%)
2025-12-04T12:25:14.1422010Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.json (deflated 33%)
2025-12-04T12:25:14.1423307Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.json (deflated 34%)
2025-12-04T12:25:14.1424597Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.json (deflated 34%)
2025-12-04T12:25:14.1425993Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.json (deflated 34%)
2025-12-04T12:25:14.1427307Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.json (deflated 34%)
2025-12-04T12:25:14.1428598Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.json (deflated 34%)
2025-12-04T12:25:14.1429889Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.json (deflated 33%)
2025-12-04T12:25:14.1431179Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.json (deflated 34%)
2025-12-04T12:25:14.1432554Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.json (deflated 34%)
2025-12-04T12:25:14.1433809Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.json (deflated 34%)
2025-12-04T12:25:14.1435066Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.json (deflated 33%)
2025-12-04T12:25:14.1436319Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.json (deflated 35%)
2025-12-04T12:25:14.1437556Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.json (deflated 33%)
2025-12-04T12:25:14.1438810Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.json (deflated 34%)
2025-12-04T12:25:14.1440057Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.json (deflated 34%)
2025-12-04T12:25:14.1441305Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.json (deflated 33%)
2025-12-04T12:25:14.1442539Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.json (deflated 36%)
2025-12-04T12:25:14.1464318Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.json (deflated 35%)
2025-12-04T12:25:14.1465763Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.json (deflated 35%)
2025-12-04T12:25:14.1467057Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.json (deflated 35%)
2025-12-04T12:25:14.1468352Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.json (deflated 35%)
2025-12-04T12:25:14.1469856Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.json (deflated 34%)
2025-12-04T12:25:14.1471118Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.json (deflated 34%)
2025-12-04T12:25:14.1472358Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.json (deflated 35%)
2025-12-04T12:25:14.1473615Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.json (deflated 35%)
2025-12-04T12:25:14.1474857Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.json (deflated 36%)
2025-12-04T12:25:14.1476101Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.json (deflated 35%)
2025-12-04T12:25:14.1477327Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.json (deflated 35%)
2025-12-04T12:25:14.1478655Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.json (deflated 35%)
2025-12-04T12:25:14.1479948Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.json (deflated 36%)
2025-12-04T12:25:14.1481203Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.json (deflated 35%)
2025-12-04T12:25:14.1482450Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.json (deflated 35%)
2025-12-04T12:25:14.1483694Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.json (deflated 36%)
2025-12-04T12:25:14.1484945Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.json (deflated 36%)
2025-12-04T12:25:14.1486199Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.json (deflated 35%)
2025-12-04T12:25:14.1487454Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.json (deflated 35%)
2025-12-04T12:25:14.1488695Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.json (deflated 36%)
2025-12-04T12:25:14.1489936Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.json (deflated 35%)
2025-12-04T12:25:14.1491179Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.json (deflated 35%)
2025-12-04T12:25:14.1492418Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.json (deflated 34%)
2025-12-04T12:25:14.1493648Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.json (deflated 35%)
2025-12-04T12:25:14.1494895Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.json (deflated 32%)
2025-12-04T12:25:14.1496140Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.json (deflated 32%)
2025-12-04T12:25:14.1497684Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.json (deflated 33%)
2025-12-04T12:25:14.1498959Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.json (deflated 33%)
2025-12-04T12:25:14.1500235Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.json (deflated 33%)
2025-12-04T12:25:14.1501520Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.json (deflated 32%)
2025-12-04T12:25:14.1502843Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.json (deflated 33%)
2025-12-04T12:25:14.1504121Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.json (deflated 32%)
2025-12-04T12:25:14.1505403Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.json (deflated 33%)
2025-12-04T12:25:14.1506702Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.json (deflated 32%)
2025-12-04T12:25:14.1507983Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.json (deflated 33%)
2025-12-04T12:25:14.1509442Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.json (deflated 33%)
2025-12-04T12:25:14.1510663Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.json (deflated 32%)
2025-12-04T12:25:14.1511927Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.json (deflated 33%)
2025-12-04T12:25:14.1513169Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.json (deflated 33%)
2025-12-04T12:25:14.1514369Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.json (deflated 33%)
2025-12-04T12:25:14.1515577Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.json (deflated 33%)
2025-12-04T12:25:14.1516789Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.json (deflated 33%)
2025-12-04T12:25:14.1518007Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.json (deflated 33%)
2025-12-04T12:25:14.1519221Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.json (deflated 32%)
2025-12-04T12:25:14.1520418Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.json (deflated 32%)
2025-12-04T12:25:14.1521994Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.json (deflated 32%)
2025-12-04T12:25:14.1523287Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.json (deflated 33%)
2025-12-04T12:25:14.1524576Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.json (deflated 33%)
2025-12-04T12:25:14.1525858Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.json (deflated 33%)
2025-12-04T12:25:14.1527149Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.json (deflated 34%)
2025-12-04T12:25:14.1528434Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.json (deflated 33%)
2025-12-04T12:25:14.1529716Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.json (deflated 33%)
2025-12-04T12:25:14.1530983Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.json (deflated 33%)
2025-12-04T12:25:14.1532272Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.json (deflated 33%)
2025-12-04T12:25:14.1533663Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.json (deflated 33%)
2025-12-04T12:25:14.1534945Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.json (deflated 32%)
2025-12-04T12:25:14.1536156Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.json (deflated 33%)
2025-12-04T12:25:14.1537674Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.json (deflated 33%)
2025-12-04T12:25:14.1538964Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.json (deflated 33%)
2025-12-04T12:25:14.1540256Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.json (deflated 33%)
2025-12-04T12:25:14.1541543Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.json (deflated 33%)
2025-12-04T12:25:14.1542834Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.json (deflated 34%)
2025-12-04T12:25:14.1544224Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.json (deflated 33%)
2025-12-04T12:25:14.1545562Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.json (deflated 33%)
2025-12-04T12:25:14.1546837Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.json (deflated 34%)
2025-12-04T12:25:14.1548117Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.json (deflated 33%)
2025-12-04T12:25:14.1549584Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.json (deflated 33%)
2025-12-04T12:25:14.1550786Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.json (deflated 33%)
2025-12-04T12:25:14.1552004Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.json (deflated 34%)
2025-12-04T12:25:14.1553209Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.json (deflated 33%)
2025-12-04T12:25:14.1554407Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.json (deflated 33%)
2025-12-04T12:25:14.1555621Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.json (deflated 33%)
2025-12-04T12:25:14.1556997Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.json (deflated 33%)
2025-12-04T12:25:14.1558192Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.json (deflated 33%)
2025-12-04T12:25:14.1559398Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.json (deflated 33%)
2025-12-04T12:25:14.1560611Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.json (deflated 33%)
2025-12-04T12:25:14.1561821Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.json (deflated 34%)
2025-12-04T12:25:14.1563015Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.json (deflated 33%)
2025-12-04T12:25:14.1564220Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.json (deflated 46%)
2025-12-04T12:25:14.1565415Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.json (deflated 33%)
2025-12-04T12:25:14.1566646Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.json (deflated 33%)
2025-12-04T12:25:14.1567839Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.json (deflated 34%)
2025-12-04T12:25:14.1569039Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.json (deflated 33%)
2025-12-04T12:25:14.1570248Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.json (deflated 33%)
2025-12-04T12:25:14.1571447Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.json (deflated 33%)
2025-12-04T12:25:14.1572746Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.json (deflated 35%)
2025-12-04T12:25:14.1573892Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.json (deflated 35%)
2025-12-04T12:25:14.1575092Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.json (deflated 34%)
2025-12-04T12:25:14.1576278Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.json (deflated 35%)
2025-12-04T12:25:14.1577725Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.json (deflated 35%)
2025-12-04T12:25:14.1579012Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.json (deflated 34%)
2025-12-04T12:25:14.1580292Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.json (deflated 31%)
2025-12-04T12:25:14.1581574Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.json (deflated 33%)
2025-12-04T12:25:14.1582863Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.json (deflated 33%)
2025-12-04T12:25:14.1584412Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.json (deflated 95%)
2025-12-04T12:25:14.1585976Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.json (deflated 33%)
2025-12-04T12:25:14.1587259Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.json (deflated 33%)
2025-12-04T12:25:14.1588539Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.json (deflated 33%)
2025-12-04T12:25:14.1589872Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.json (deflated 33%)
2025-12-04T12:25:14.1591088Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.json (deflated 33%)
2025-12-04T12:25:14.1592297Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.json (deflated 49%)
2025-12-04T12:25:14.1593510Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.json (deflated 32%)
2025-12-04T12:25:14.1594913Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.json (deflated 33%)
2025-12-04T12:25:14.1596156Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.json (deflated 34%)
2025-12-04T12:25:14.1597498Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.json (deflated 33%)
2025-12-04T12:25:14.1598747Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.json (deflated 33%)
2025-12-04T12:25:14.1599950Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.json (deflated 34%)
2025-12-04T12:25:14.1601164Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.json (deflated 33%)
2025-12-04T12:25:14.1602383Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.json (deflated 33%)
2025-12-04T12:25:14.1603592Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.json (deflated 33%)
2025-12-04T12:25:14.1604794Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.json (deflated 33%)
2025-12-04T12:25:14.1606105Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.json (deflated 33%)
2025-12-04T12:25:14.1607339Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.json (deflated 35%)
2025-12-04T12:25:14.1608517Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.json (deflated 36%)
2025-12-04T12:25:14.1609661Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.json (deflated 35%)
2025-12-04T12:25:14.1610791Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.json (deflated 36%)
2025-12-04T12:25:14.1611931Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.json (deflated 35%)
2025-12-04T12:25:14.1613065Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.json (deflated 35%)
2025-12-04T12:25:14.1614214Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.json (deflated 35%)
2025-12-04T12:25:14.1615353Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.json (deflated 35%)
2025-12-04T12:25:14.1616555Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.json (deflated 35%)
2025-12-04T12:25:14.1617983Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.json (deflated 35%)
2025-12-04T12:25:14.1619268Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.json (deflated 34%)
2025-12-04T12:25:14.1620535Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.json (deflated 36%)
2025-12-04T12:25:14.1621994Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.json (deflated 35%)
2025-12-04T12:25:14.1623286Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.json (deflated 35%)
2025-12-04T12:25:14.1624569Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.json (deflated 35%)
2025-12-04T12:25:14.1625837Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.json (deflated 34%)
2025-12-04T12:25:14.1627134Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.json (deflated 35%)
2025-12-04T12:25:14.1628420Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.json (deflated 36%)
2025-12-04T12:25:14.1629781Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.json (deflated 36%)
2025-12-04T12:25:14.1631057Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.json (deflated 35%)
2025-12-04T12:25:14.1632342Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.json (deflated 36%)
2025-12-04T12:25:14.1633643Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.json (deflated 34%)
2025-12-04T12:25:14.1634788Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.json (deflated 35%)
2025-12-04T12:25:14.1635921Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.json (deflated 33%)
2025-12-04T12:25:14.1637052Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.json (deflated 45%)
2025-12-04T12:25:14.1638270Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.json (deflated 49%)
2025-12-04T12:25:14.1639451Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.json (deflated 34%)
2025-12-04T12:25:14.1640584Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.json (deflated 35%)
2025-12-04T12:25:14.1641732Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.json (deflated 35%)
2025-12-04T12:25:14.1642880Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.json (deflated 34%)
2025-12-04T12:25:14.1644034Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.json (deflated 34%)
2025-12-04T12:25:14.1645180Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.json (deflated 35%)
2025-12-04T12:25:14.1646314Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.json (deflated 32%)
2025-12-04T12:25:14.1647456Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.json (deflated 48%)
2025-12-04T12:25:14.1648599Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.json (deflated 31%)
2025-12-04T12:25:14.1649743Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.json (deflated 32%)
2025-12-04T12:25:14.1650867Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.json (deflated 35%)
2025-12-04T12:25:14.1652009Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.json (deflated 34%)
2025-12-04T12:25:14.1653155Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.json (deflated 34%)
2025-12-04T12:25:14.1654295Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.json (deflated 34%)
2025-12-04T12:25:14.1655412Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.json (deflated 34%)
2025-12-04T12:25:14.1656612Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.json (deflated 34%)
2025-12-04T12:25:14.1658055Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.json (deflated 34%)
2025-12-04T12:25:14.1659353Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.json (deflated 33%)
2025-12-04T12:25:14.1660677Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.json (deflated 33%)
2025-12-04T12:25:14.1661971Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.json (deflated 34%)
2025-12-04T12:25:14.1663261Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.json (deflated 33%)
2025-12-04T12:25:14.1664543Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.json (deflated 34%)
2025-12-04T12:25:14.1665818Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.json (deflated 34%)
2025-12-04T12:25:14.1667104Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.json (deflated 33%)
2025-12-04T12:25:14.1668383Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.json (deflated 33%)
2025-12-04T12:25:14.1669754Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.json (deflated 34%)
2025-12-04T12:25:14.1670910Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.json (deflated 34%)
2025-12-04T12:25:14.1672049Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.json (deflated 33%)
2025-12-04T12:25:14.1673197Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.json (deflated 34%)
2025-12-04T12:25:14.1674432Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.json (stored 0%)
2025-12-04T12:25:14.1675739Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.json (stored 0%)
2025-12-04T12:25:14.1677052Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.json (stored 0%)
2025-12-04T12:25:14.1678371Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.json (stored 0%)
2025-12-04T12:25:14.1679672Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.json (stored 0%)
2025-12-04T12:25:14.1680971Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.json (stored 0%)
2025-12-04T12:25:14.1682275Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.json (stored 0%)
2025-12-04T12:25:14.1683585Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.json (stored 0%)
2025-12-04T12:25:14.1684893Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.json (stored 0%)
2025-12-04T12:25:14.1686440Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.json (stored 0%)
2025-12-04T12:25:14.1687820Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.json (stored 0%)
2025-12-04T12:25:14.1689204Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.json (stored 0%)
2025-12-04T12:25:14.1690598Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.json (stored 0%)
2025-12-04T12:25:14.1692016Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.json (stored 0%)
2025-12-04T12:25:14.1693381Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.json (stored 0%)
2025-12-04T12:25:14.1694764Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.json (stored 0%)
2025-12-04T12:25:14.1696406Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.json (stored 0%)
2025-12-04T12:25:14.1698033Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.json (stored 0%)
2025-12-04T12:25:14.1699528Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.json (deflated 37%)
2025-12-04T12:25:14.1701104Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.json (deflated 44%)
2025-12-04T12:25:14.1702636Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.json (deflated 37%)
2025-12-04T12:25:14.1704139Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.json (deflated 37%)
2025-12-04T12:25:14.1705653Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.json (deflated 37%)
2025-12-04T12:25:14.1707153Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.json (deflated 37%)
2025-12-04T12:25:14.1708671Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.json (deflated 44%)
2025-12-04T12:25:14.1710266Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.json (deflated 45%)
2025-12-04T12:25:14.1711610Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.json (deflated 43%)
2025-12-04T12:25:14.1712937Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.json (deflated 43%)
2025-12-04T12:25:14.1714272Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.json (deflated 43%)
2025-12-04T12:25:14.1715806Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.json (deflated 43%)
2025-12-04T12:25:14.1717226Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.json (deflated 43%)
2025-12-04T12:25:14.1718644Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.json (deflated 37%)
2025-12-04T12:25:14.1720056Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.json (deflated 44%)
2025-12-04T12:25:14.1721816Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.json (deflated 37%)
2025-12-04T12:25:14.1723337Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.json (deflated 44%)
2025-12-04T12:25:14.1724920Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.json (deflated 44%)
2025-12-04T12:25:14.1726427Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.json (deflated 42%)
2025-12-04T12:25:14.1727944Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.json (deflated 37%)
2025-12-04T12:25:14.1729452Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.json (deflated 37%)
2025-12-04T12:25:14.1730969Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.json (deflated 43%)
2025-12-04T12:25:14.1732475Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.json (deflated 37%)
2025-12-04T12:25:14.1734222Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.json (deflated 57%)
2025-12-04T12:25:14.1735671Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.json (deflated 44%)
2025-12-04T12:25:14.1737345Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.json (deflated 43%)
2025-12-04T12:25:14.1738857Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.json (deflated 44%)
2025-12-04T12:25:14.1740364Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.json (deflated 44%)
2025-12-04T12:25:14.1741886Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.json (deflated 57%)
2025-12-04T12:25:14.1743400Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.json (deflated 38%)
2025-12-04T12:25:14.1744918Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.json (deflated 37%)
2025-12-04T12:25:14.1746426Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.json (deflated 43%)
2025-12-04T12:25:14.1747920Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.json (deflated 37%)
2025-12-04T12:25:14.1749589Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.json (deflated 43%)
2025-12-04T12:25:14.1750935Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.json (deflated 37%)
2025-12-04T12:25:14.1752274Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.json (deflated 38%)
2025-12-04T12:25:14.1753600Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.json (deflated 37%)
2025-12-04T12:25:14.1754935Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.json (deflated 36%)
2025-12-04T12:25:14.1756272Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.json (deflated 37%)
2025-12-04T12:25:14.1757613Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.json (deflated 37%)
2025-12-04T12:25:14.1758988Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.json (deflated 44%)
2025-12-04T12:25:14.1760318Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.json (deflated 44%)
2025-12-04T12:25:14.1761644Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.json (deflated 45%)
2025-12-04T12:25:14.1762978Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.json (deflated 45%)
2025-12-04T12:25:14.1764322Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.json (deflated 43%)
2025-12-04T12:25:14.1765649Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.json (deflated 43%)
2025-12-04T12:25:14.1767036Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.json (deflated 43%)
2025-12-04T12:25:14.1768407Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.json (deflated 44%)
2025-12-04T12:25:14.1769756Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.json (deflated 44%)
2025-12-04T12:25:14.1771107Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.json (deflated 43%)
2025-12-04T12:25:14.1772437Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.json (deflated 37%)
2025-12-04T12:25:14.1773774Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.json (deflated 44%)
2025-12-04T12:25:14.1775114Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.json (deflated 37%)
2025-12-04T12:25:14.1776513Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.json (deflated 36%)
2025-12-04T12:25:14.1778148Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.json (deflated 42%)
2025-12-04T12:25:14.1779650Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.json (deflated 37%)
2025-12-04T12:25:14.1781157Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.json (deflated 43%)
2025-12-04T12:25:14.1782670Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.json (deflated 37%)
2025-12-04T12:25:14.1784169Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.json (deflated 37%)
2025-12-04T12:25:14.1785678Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.json (deflated 37%)
2025-12-04T12:25:14.1787190Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.json (deflated 37%)
2025-12-04T12:25:14.1788804Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.json (deflated 37%)
2025-12-04T12:25:14.1790323Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.json (deflated 45%)
2025-12-04T12:25:14.1791647Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.json (deflated 43%)
2025-12-04T12:25:14.1792986Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.json (deflated 43%)
2025-12-04T12:25:14.1794321Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.json (deflated 37%)
2025-12-04T12:25:14.1795665Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.json (deflated 38%)
2025-12-04T12:25:14.1796997Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.json (deflated 37%)
2025-12-04T12:25:14.1798392Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.json (deflated 38%)
2025-12-04T12:25:14.1799768Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.json (deflated 36%)
2025-12-04T12:25:14.1801108Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.json (deflated 37%)
2025-12-04T12:25:14.1802451Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.json (deflated 45%)
2025-12-04T12:25:14.1803786Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.json (deflated 45%)
2025-12-04T12:25:14.1805118Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.json (deflated 43%)
2025-12-04T12:25:14.1806458Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.json (deflated 37%)
2025-12-04T12:25:14.1807809Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.json (deflated 37%)
2025-12-04T12:25:14.1809146Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.json (deflated 44%)
2025-12-04T12:25:14.1810504Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.json (deflated 37%)
2025-12-04T12:25:14.1811853Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.json (deflated 44%)
2025-12-04T12:25:14.1813202Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.json (deflated 38%)
2025-12-04T12:25:14.1814548Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.json (deflated 47%)
2025-12-04T12:25:14.1815881Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.json (deflated 37%)
2025-12-04T12:25:14.1817532Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.json (deflated 56%)
2025-12-04T12:25:14.1819037Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.json (deflated 37%)
2025-12-04T12:25:14.1820586Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.json (deflated 37%)
2025-12-04T12:25:14.1822279Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.json (deflated 43%)
2025-12-04T12:25:14.1823795Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.json (deflated 37%)
2025-12-04T12:25:14.1825304Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.json (deflated 37%)
2025-12-04T12:25:14.1826813Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.json (deflated 37%)
2025-12-04T12:25:14.1828323Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.json (deflated 43%)
2025-12-04T12:25:14.1829949Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.json (deflated 44%)
2025-12-04T12:25:14.1831508Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.json (deflated 38%)
2025-12-04T12:25:14.1833214Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.json (deflated 43%)
2025-12-04T12:25:14.1834636Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.json (deflated 37%)
2025-12-04T12:25:14.1836038Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.json (deflated 37%)
2025-12-04T12:25:14.1837466Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.json (deflated 37%)
2025-12-04T12:25:14.1839085Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.json (deflated 46%)
2025-12-04T12:25:14.1840543Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.json (deflated 45%)
2025-12-04T12:25:14.1842012Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.json (deflated 37%)
2025-12-04T12:25:14.1843481Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.json (deflated 37%)
2025-12-04T12:25:14.1844950Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.json (deflated 44%)
2025-12-04T12:25:14.1846421Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.json (deflated 37%)
2025-12-04T12:25:14.1847895Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.json (deflated 37%)
2025-12-04T12:25:14.1849356Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.json (deflated 37%)
2025-12-04T12:25:14.1850964Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.json (deflated 37%)
2025-12-04T12:25:14.1852314Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.json (deflated 44%)
2025-12-04T12:25:14.1853660Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.json (deflated 45%)
2025-12-04T12:25:14.1855051Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.json (deflated 43%)
2025-12-04T12:25:14.1856461Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.json (deflated 43%)
2025-12-04T12:25:14.1858119Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.json (deflated 43%)
2025-12-04T12:25:14.1859646Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.json (deflated 43%)
2025-12-04T12:25:14.1861171Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.json (deflated 43%)
2025-12-04T12:25:14.1862685Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.json (deflated 37%)
2025-12-04T12:25:14.1864271Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.json (deflated 44%)
2025-12-04T12:25:14.1865822Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.json (deflated 37%)
2025-12-04T12:25:14.1867342Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.json (deflated 44%)
2025-12-04T12:25:14.1868952Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.json (deflated 44%)
2025-12-04T12:25:14.1870422Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.json (deflated 42%)
2025-12-04T12:25:14.1871786Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.json (deflated 37%)
2025-12-04T12:25:14.1873138Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.json (deflated 37%)
2025-12-04T12:25:14.1874495Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.json (deflated 43%)
2025-12-04T12:25:14.1875841Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.json (deflated 37%)
2025-12-04T12:25:14.1877197Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.json (deflated 57%)
2025-12-04T12:25:14.1878547Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.json (deflated 44%)
2025-12-04T12:25:14.1879892Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.json (deflated 43%)
2025-12-04T12:25:14.1881232Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.json (deflated 44%)
2025-12-04T12:25:14.1882580Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.json (deflated 44%)
2025-12-04T12:25:14.1883930Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.json (deflated 57%)
2025-12-04T12:25:14.1885283Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.json (deflated 37%)
2025-12-04T12:25:14.1886670Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.json (deflated 37%)
2025-12-04T12:25:14.1888010Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.json (deflated 43%)
2025-12-04T12:25:14.1889353Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.json (deflated 37%)
2025-12-04T12:25:14.1890702Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.json (deflated 43%)
2025-12-04T12:25:14.1892048Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.json (deflated 37%)
2025-12-04T12:25:14.1893386Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.json (deflated 38%)
2025-12-04T12:25:14.1894782Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.json (deflated 37%)
2025-12-04T12:25:14.1896156Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.json (deflated 36%)
2025-12-04T12:25:14.1897850Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.json (deflated 37%)
2025-12-04T12:25:14.1899370Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.json (deflated 37%)
2025-12-04T12:25:14.1900888Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.json (deflated 44%)
2025-12-04T12:25:14.1902425Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.json (deflated 44%)
2025-12-04T12:25:14.1903949Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.json (deflated 45%)
2025-12-04T12:25:14.1905466Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.json (deflated 45%)
2025-12-04T12:25:14.1906973Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.json (deflated 43%)
2025-12-04T12:25:14.1908502Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.json (deflated 43%)
2025-12-04T12:25:14.1910114Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.json (deflated 43%)
2025-12-04T12:25:14.1911475Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.json (deflated 44%)
2025-12-04T12:25:14.1912839Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.json (deflated 44%)
2025-12-04T12:25:14.1914176Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.json (deflated 43%)
2025-12-04T12:25:14.1915527Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.json (deflated 37%)
2025-12-04T12:25:14.1916885Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.json (deflated 44%)
2025-12-04T12:25:14.1918269Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.json (deflated 37%)
2025-12-04T12:25:14.1919613Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.json (deflated 36%)
2025-12-04T12:25:14.1921101Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.json (deflated 42%)
2025-12-04T12:25:14.1922802Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.json (deflated 37%)
2025-12-04T12:25:14.1924327Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.json (deflated 43%)
2025-12-04T12:25:14.1925854Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.json (deflated 37%)
2025-12-04T12:25:14.1927462Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.json (deflated 37%)
2025-12-04T12:25:14.1929024Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.json (deflated 37%)
2025-12-04T12:25:14.1930537Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.json (deflated 37%)
2025-12-04T12:25:14.1932054Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.json (deflated 37%)
2025-12-04T12:25:14.1933670Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.json (deflated 45%)
2025-12-04T12:25:14.1935009Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.json (deflated 43%)
2025-12-04T12:25:14.1936413Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.json (deflated 43%)
2025-12-04T12:25:14.1938064Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.json (deflated 37%)
2025-12-04T12:25:14.1939571Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.json (deflated 39%)
2025-12-04T12:25:14.1941080Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.json (deflated 37%)
2025-12-04T12:25:14.1942602Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.json (deflated 38%)
2025-12-04T12:25:14.1944127Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.json (deflated 36%)
2025-12-04T12:25:14.1945650Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.json (deflated 37%)
2025-12-04T12:25:14.1947156Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.json (deflated 45%)
2025-12-04T12:25:14.1948766Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.json (deflated 45%)
2025-12-04T12:25:14.1950250Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.json (deflated 43%)
2025-12-04T12:25:14.1951653Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.json (deflated 37%)
2025-12-04T12:25:14.1953006Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.json (deflated 37%)
2025-12-04T12:25:14.1954345Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.json (deflated 44%)
2025-12-04T12:25:14.1955697Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.json (deflated 37%)
2025-12-04T12:25:14.1957051Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.json (deflated 44%)
2025-12-04T12:25:14.1958399Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.json (deflated 38%)
2025-12-04T12:25:14.1959781Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.json (deflated 47%)
2025-12-04T12:25:14.1961154Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.json (deflated 37%)
2025-12-04T12:25:14.1962511Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.json (deflated 56%)
2025-12-04T12:25:14.1963860Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.json (deflated 37%)
2025-12-04T12:25:14.1965208Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.json (deflated 37%)
2025-12-04T12:25:14.1966555Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.json (deflated 43%)
2025-12-04T12:25:14.1967894Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.json (deflated 37%)
2025-12-04T12:25:14.1969235Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.json (deflated 37%)
2025-12-04T12:25:14.1970582Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.json (deflated 37%)
2025-12-04T12:25:14.1971931Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.json (deflated 43%)
2025-12-04T12:25:14.1973266Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.json (deflated 44%)
2025-12-04T12:25:14.1974613Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.json (deflated 38%)
2025-12-04T12:25:14.1975965Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.json (deflated 43%)
2025-12-04T12:25:14.1977636Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.json (deflated 37%)
2025-12-04T12:25:14.1979151Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.json (deflated 37%)
2025-12-04T12:25:14.1980660Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.json (deflated 37%)
2025-12-04T12:25:14.1982174Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.json (deflated 46%)
2025-12-04T12:25:14.1983729Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.json (deflated 45%)
2025-12-04T12:25:14.1985247Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.json (deflated 37%)
2025-12-04T12:25:14.1986755Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.json (deflated 37%)
2025-12-04T12:25:14.1988262Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.json (deflated 38%)
2025-12-04T12:25:14.1989779Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.json (deflated 37%)
2025-12-04T12:25:14.1991115Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.json (deflated 37%)
2025-12-04T12:25:14.1992499Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.json (deflated 45%)
2025-12-04T12:25:14.1993846Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.json (deflated 36%)
2025-12-04T12:25:14.1995176Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.json (deflated 43%)
2025-12-04T12:25:14.1996505Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.json (deflated 43%)
2025-12-04T12:25:14.1997840Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.json (deflated 36%)
2025-12-04T12:25:14.1999161Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.json (deflated 36%)
2025-12-04T12:25:14.2000502Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.json (deflated 37%)
2025-12-04T12:25:14.2001843Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.json (deflated 37%)
2025-12-04T12:25:14.2003188Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.json (deflated 36%)
2025-12-04T12:25:14.2004508Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.json (deflated 37%)
2025-12-04T12:25:14.2005847Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.json (deflated 44%)
2025-12-04T12:25:14.2007179Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.json (deflated 45%)
2025-12-04T12:25:14.2008513Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.json (deflated 44%)
2025-12-04T12:25:14.2010067Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.json (deflated 44%)
2025-12-04T12:25:14.2011467Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.json (deflated 37%)
2025-12-04T12:25:14.2012872Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.json (deflated 37%)
2025-12-04T12:25:14.2014316Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.json (deflated 37%)
2025-12-04T12:25:14.2015726Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.json (deflated 37%)
2025-12-04T12:25:14.2017388Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.json (deflated 37%)
2025-12-04T12:25:14.2018891Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.json (deflated 44%)
2025-12-04T12:25:14.2020392Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.json (deflated 44%)
2025-12-04T12:25:14.2022062Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.json (deflated 37%)
2025-12-04T12:25:14.2023674Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.json (deflated 37%)
2025-12-04T12:25:14.2025203Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.json (deflated 37%)
2025-12-04T12:25:14.2026707Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.json (deflated 57%)
2025-12-04T12:25:14.2028208Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.json (deflated 38%)
2025-12-04T12:25:14.2029701Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.json (deflated 37%)
2025-12-04T12:25:14.2031196Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.json (deflated 37%)
2025-12-04T12:25:14.2032706Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.json (deflated 42%)
2025-12-04T12:25:14.2034161Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.json (deflated 37%)
2025-12-04T12:25:14.2035491Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.json (deflated 37%)
2025-12-04T12:25:14.2036818Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.json (deflated 38%)
2025-12-04T12:25:14.2038133Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.json (deflated 37%)
2025-12-04T12:25:14.2039473Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.json (deflated 36%)
2025-12-04T12:25:14.2040995Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.json (deflated 37%)
2025-12-04T12:25:14.2042485Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.json (deflated 36%)
2025-12-04T12:25:14.2043946Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.json (deflated 37%)
2025-12-04T12:25:14.2045371Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.json (deflated 37%)
2025-12-04T12:25:14.2047026Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.json (deflated 37%)
2025-12-04T12:25:14.2048486Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.json (deflated 37%)
2025-12-04T12:25:14.2049930Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.json (deflated 44%)
2025-12-04T12:25:14.2051393Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.json (deflated 37%)
2025-12-04T12:25:14.2052952Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.json (deflated 37%)
2025-12-04T12:25:14.2054366Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.json (deflated 44%)
2025-12-04T12:25:14.2055827Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.json (deflated 45%)
2025-12-04T12:25:14.2058209Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.json (deflated 43%)
2025-12-04T12:25:14.2059721Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.json (deflated 42%)
2025-12-04T12:25:14.2061224Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.json (deflated 37%)
2025-12-04T12:25:14.2062717Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.json (deflated 48%)
2025-12-04T12:25:14.2064217Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.json (deflated 36%)
2025-12-04T12:25:14.2065725Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.json (deflated 37%)
2025-12-04T12:25:14.2067227Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.json (deflated 37%)
2025-12-04T12:25:14.2068836Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.json (deflated 43%)
2025-12-04T12:25:14.2070289Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.json (deflated 37%)
2025-12-04T12:25:14.2071615Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.json (deflated 37%)
2025-12-04T12:25:14.2072957Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.json (deflated 37%)
2025-12-04T12:25:14.2074285Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.json (deflated 37%)
2025-12-04T12:25:14.2075619Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.json (deflated 37%)
2025-12-04T12:25:14.2076951Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.json (deflated 37%)
2025-12-04T12:25:14.2078300Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.json (deflated 37%)
2025-12-04T12:25:14.2079643Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.json (deflated 44%)
2025-12-04T12:25:14.2081014Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.json (deflated 43%)
2025-12-04T12:25:14.2082338Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.json (deflated 38%)
2025-12-04T12:25:14.2083666Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.json (deflated 37%)
2025-12-04T12:25:14.2084991Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.json (deflated 38%)
2025-12-04T12:25:14.2086327Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.json (deflated 36%)
2025-12-04T12:25:14.2087658Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.json (deflated 43%)
2025-12-04T12:25:14.2089049Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.json (deflated 36%)
2025-12-04T12:25:14.2090405Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.json (deflated 37%)
2025-12-04T12:25:14.2091740Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.json (deflated 43%)
2025-12-04T12:25:14.2093079Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.json (deflated 43%)
2025-12-04T12:25:14.2094403Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.json (deflated 44%)
2025-12-04T12:25:14.2095751Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.json (deflated 37%)
2025-12-04T12:25:14.2097376Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.json (deflated 37%)
2025-12-04T12:25:14.2098882Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.json (deflated 44%)
2025-12-04T12:25:14.2100388Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.json (deflated 42%)
2025-12-04T12:25:14.2101896Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.json (deflated 47%)
2025-12-04T12:25:14.2103403Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.json (deflated 48%)
2025-12-04T12:25:14.2104928Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.json (deflated 56%)
2025-12-04T12:25:14.2106439Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.json (deflated 37%)
2025-12-04T12:25:14.2107936Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.json (deflated 37%)
2025-12-04T12:25:14.2109492Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.json (deflated 44%)
2025-12-04T12:25:14.2110825Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.json (deflated 37%)
2025-12-04T12:25:14.2112201Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.json (deflated 37%)
2025-12-04T12:25:14.2113535Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.json (deflated 37%)
2025-12-04T12:25:14.2114878Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.json (deflated 37%)
2025-12-04T12:25:14.2116221Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.json (deflated 44%)
2025-12-04T12:25:14.2117553Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.json (deflated 45%)
2025-12-04T12:25:14.2118896Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.json (deflated 37%)
2025-12-04T12:25:14.2120273Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.json (deflated 43%)
2025-12-04T12:25:14.2122015Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.json (deflated 42%)
2025-12-04T12:25:14.2123514Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.json (deflated 42%)
2025-12-04T12:25:14.2125011Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.json (deflated 37%)
2025-12-04T12:25:14.2126503Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.json (deflated 37%)
2025-12-04T12:25:14.2128015Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.json (deflated 37%)
2025-12-04T12:25:14.2129534Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.json (deflated 37%)
2025-12-04T12:25:14.2131060Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.json (deflated 38%)
2025-12-04T12:25:14.2132572Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.json (deflated 37%)
2025-12-04T12:25:14.2134146Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.json (deflated 37%)
2025-12-04T12:25:14.2135574Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.json (deflated 45%)
2025-12-04T12:25:14.2137239Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.json (deflated 37%)
2025-12-04T12:25:14.2138761Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.json (deflated 43%)
2025-12-04T12:25:14.2140261Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.json (deflated 43%)
2025-12-04T12:25:14.2141765Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.json (deflated 36%)
2025-12-04T12:25:14.2143281Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.json (deflated 36%)
2025-12-04T12:25:14.2144798Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.json (deflated 37%)
2025-12-04T12:25:14.2146390Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.json (deflated 36%)
2025-12-04T12:25:14.2147893Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.json (deflated 36%)
2025-12-04T12:25:14.2149685Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.json (deflated 36%)
2025-12-04T12:25:14.2151029Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.json (deflated 44%)
2025-12-04T12:25:14.2152367Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.json (deflated 45%)
2025-12-04T12:25:14.2153697Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.json (deflated 44%)
2025-12-04T12:25:14.2155142Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.json (deflated 44%)
2025-12-04T12:25:14.2156492Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.json (deflated 37%)
2025-12-04T12:25:14.2157836Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.json (deflated 37%)
2025-12-04T12:25:14.2159170Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.json (deflated 37%)
2025-12-04T12:25:14.2160505Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.json (deflated 37%)
2025-12-04T12:25:14.2161855Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.json (deflated 37%)
2025-12-04T12:25:14.2163200Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.json (deflated 44%)
2025-12-04T12:25:14.2164550Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.json (deflated 44%)
2025-12-04T12:25:14.2165890Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.json (deflated 37%)
2025-12-04T12:25:14.2167238Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.json (deflated 37%)
2025-12-04T12:25:14.2168577Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.json (deflated 37%)
2025-12-04T12:25:14.2169924Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.json (deflated 57%)
2025-12-04T12:25:14.2171272Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.json (deflated 38%)
2025-12-04T12:25:14.2172604Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.json (deflated 37%)
2025-12-04T12:25:14.2173942Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.json (deflated 37%)
2025-12-04T12:25:14.2175286Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.json (deflated 42%)
2025-12-04T12:25:14.2176900Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.json (deflated 37%)
2025-12-04T12:25:14.2178414Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.json (deflated 37%)
2025-12-04T12:25:14.2179932Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.json (deflated 38%)
2025-12-04T12:25:14.2181441Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.json (deflated 37%)
2025-12-04T12:25:14.2182965Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.json (deflated 36%)
2025-12-04T12:25:14.2184490Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.json (deflated 37%)
2025-12-04T12:25:14.2186069Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.json (deflated 37%)
2025-12-04T12:25:14.2187617Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.json (deflated 37%)
2025-12-04T12:25:14.2189225Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.json (deflated 37%)
2025-12-04T12:25:14.2190570Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.json (deflated 37%)
2025-12-04T12:25:14.2191900Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.json (deflated 37%)
2025-12-04T12:25:14.2193246Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.json (deflated 44%)
2025-12-04T12:25:14.2194590Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.json (deflated 37%)
2025-12-04T12:25:14.2195936Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.json (deflated 37%)
2025-12-04T12:25:14.2197685Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.json (deflated 44%)
2025-12-04T12:25:14.2199143Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.json (deflated 45%)
2025-12-04T12:25:14.2200648Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.json (deflated 43%)
2025-12-04T12:25:14.2202136Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.json (deflated 42%)
2025-12-04T12:25:14.2203612Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.json (deflated 37%)
2025-12-04T12:25:14.2205069Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.json (deflated 48%)
2025-12-04T12:25:14.2206544Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.json (deflated 36%)
2025-12-04T12:25:14.2208015Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.json (deflated 37%)
2025-12-04T12:25:14.2209518Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.json (deflated 37%)
2025-12-04T12:25:14.2210990Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.json (deflated 43%)
2025-12-04T12:25:14.2212450Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.json (deflated 37%)
2025-12-04T12:25:14.2213919Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.json (deflated 37%)
2025-12-04T12:25:14.2215388Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.json (deflated 37%)
2025-12-04T12:25:14.2217117Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.json (deflated 37%)
2025-12-04T12:25:14.2218697Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.json (deflated 37%)
2025-12-04T12:25:14.2220257Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.json (deflated 37%)
2025-12-04T12:25:14.2221968Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.json (deflated 37%)
2025-12-04T12:25:14.2223494Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.json (deflated 44%)
2025-12-04T12:25:14.2225022Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.json (deflated 43%)
2025-12-04T12:25:14.2226549Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.json (deflated 38%)
2025-12-04T12:25:14.2228087Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.json (deflated 37%)
2025-12-04T12:25:14.2229611Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.json (deflated 38%)
2025-12-04T12:25:14.2231130Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.json (deflated 36%)
2025-12-04T12:25:14.2232741Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.json (deflated 43%)
2025-12-04T12:25:14.2234746Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.json (deflated 36%)
2025-12-04T12:25:14.2236190Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.json (deflated 37%)
2025-12-04T12:25:14.2237637Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.json (deflated 43%)
2025-12-04T12:25:14.2239073Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.json (deflated 43%)
2025-12-04T12:25:14.2240500Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.json (deflated 44%)
2025-12-04T12:25:14.2241936Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.json (deflated 37%)
2025-12-04T12:25:14.2243466Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.json (deflated 37%)
2025-12-04T12:25:14.2244913Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.json (deflated 44%)
2025-12-04T12:25:14.2246335Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.json (deflated 42%)
2025-12-04T12:25:14.2247765Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.json (deflated 47%)
2025-12-04T12:25:14.2249193Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.json (deflated 48%)
2025-12-04T12:25:14.2250630Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.json (deflated 56%)
2025-12-04T12:25:14.2252132Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.json (deflated 37%)
2025-12-04T12:25:14.2253598Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.json (deflated 37%)
2025-12-04T12:25:14.2255028Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.json (deflated 44%)
2025-12-04T12:25:14.2256533Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.json (deflated 37%)
2025-12-04T12:25:14.2258206Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.json (deflated 37%)
2025-12-04T12:25:14.2259720Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.json (deflated 37%)
2025-12-04T12:25:14.2261250Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.json (deflated 37%)
2025-12-04T12:25:14.2262777Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.json (deflated 44%)
2025-12-04T12:25:14.2264301Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.json (deflated 45%)
2025-12-04T12:25:14.2265824Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.json (deflated 37%)
2025-12-04T12:25:14.2267343Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.json (deflated 43%)
2025-12-04T12:25:14.2268872Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.json (deflated 42%)
2025-12-04T12:25:14.2270353Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.json (deflated 42%)
2025-12-04T12:25:14.2271704Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.json (deflated 37%)
2025-12-04T12:25:14.2273048Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.json (deflated 37%)
2025-12-04T12:25:14.2274402Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.json (deflated 37%)
2025-12-04T12:25:14.2295538Z ##[group]Run # Remove any previous test reports if they exist
2025-12-04T12:25:14.2296095Z [36;1m# Remove any previous test reports if they exist[0m
2025-12-04T12:25:14.2296614Z [36;1mrm -f test-reports-*.zip[0m
2025-12-04T12:25:14.2297315Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv'[0m
2025-12-04T12:25:14.2304114Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:14.2304557Z env:
2025-12-04T12:25:14.2304812Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.2305108Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.2305473Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.2306121Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.2306919Z   FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T12:25:14.2307478Z ##[endgroup]
2025-12-04T12:25:14.2475667Z   adding: test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.xml (deflated 89%)
2025-12-04T12:25:14.2477377Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml (deflated 77%)
2025-12-04T12:25:14.2478848Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml (deflated 77%)
2025-12-04T12:25:14.2480266Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml (deflated 77%)
2025-12-04T12:25:14.2481674Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml (deflated 28%)
2025-12-04T12:25:14.2483092Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml (deflated 77%)
2025-12-04T12:25:14.2484512Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml (deflated 77%)
2025-12-04T12:25:14.2485944Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml (deflated 77%)
2025-12-04T12:25:14.2487373Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml (deflated 77%)
2025-12-04T12:25:14.2488797Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml (deflated 86%)
2025-12-04T12:25:14.2490219Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml (deflated 77%)
2025-12-04T12:25:14.2491646Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml (deflated 28%)
2025-12-04T12:25:14.2493124Z   adding: test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.xml (deflated 82%)
2025-12-04T12:25:14.2494624Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml (deflated 77%)
2025-12-04T12:25:14.2496084Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml (deflated 77%)
2025-12-04T12:25:14.2497862Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml (deflated 77%)
2025-12-04T12:25:14.2499341Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml (deflated 77%)
2025-12-04T12:25:14.2500835Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml (deflated 77%)
2025-12-04T12:25:14.2502428Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml (deflated 77%)
2025-12-04T12:25:14.2503924Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml (deflated 78%)
2025-12-04T12:25:14.2505422Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml (deflated 86%)
2025-12-04T12:25:14.2506903Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml (deflated 78%)
2025-12-04T12:25:14.2508395Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml (deflated 78%)
2025-12-04T12:25:14.2509960Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml (deflated 78%)
2025-12-04T12:25:14.2511478Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml (deflated 90%)
2025-12-04T12:25:14.2512944Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml (deflated 78%)
2025-12-04T12:25:14.2514394Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml (deflated 78%)
2025-12-04T12:25:14.2515847Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml (deflated 78%)
2025-12-04T12:25:14.2517300Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml (deflated 78%)
2025-12-04T12:25:14.2518754Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml (deflated 78%)
2025-12-04T12:25:14.2520207Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml (deflated 90%)
2025-12-04T12:25:14.2522010Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml (deflated 77%)
2025-12-04T12:25:14.2523501Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml (deflated 77%)
2025-12-04T12:25:14.2524995Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml (deflated 86%)
2025-12-04T12:25:14.2526475Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml (deflated 77%)
2025-12-04T12:25:14.2527976Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml (deflated 77%)
2025-12-04T12:25:14.2529470Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml (deflated 77%)
2025-12-04T12:25:14.2530966Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml (deflated 28%)
2025-12-04T12:25:14.2532546Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml (deflated 78%)
2025-12-04T12:25:14.2534281Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml (deflated 78%)
2025-12-04T12:25:14.2535985Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml (deflated 78%)
2025-12-04T12:25:14.2537866Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml (deflated 78%)
2025-12-04T12:25:14.2539516Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml (deflated 78%)
2025-12-04T12:25:14.2541170Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml (deflated 78%)
2025-12-04T12:25:14.2542810Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml (deflated 78%)
2025-12-04T12:25:14.2544456Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml (deflated 78%)
2025-12-04T12:25:14.2546196Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml (deflated 78%)
2025-12-04T12:25:14.2547898Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml (deflated 78%)
2025-12-04T12:25:14.2549639Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml (deflated 78%)
2025-12-04T12:25:14.2551223Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml (deflated 78%)
2025-12-04T12:25:14.2552825Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml (deflated 78%)
2025-12-04T12:25:14.2554430Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml (deflated 78%)
2025-12-04T12:25:14.2556039Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml (deflated 78%)
2025-12-04T12:25:14.2557630Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml (deflated 86%)
2025-12-04T12:25:14.2559226Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml (deflated 78%)
2025-12-04T12:25:14.2560833Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml (deflated 78%)
2025-12-04T12:25:14.2562445Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml (deflated 86%)
2025-12-04T12:25:14.2564050Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml (deflated 77%)
2025-12-04T12:25:14.2565634Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml (deflated 77%)
2025-12-04T12:25:14.2567225Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml (deflated 77%)
2025-12-04T12:25:14.2568831Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml (deflated 77%)
2025-12-04T12:25:14.2570470Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml (deflated 77%)
2025-12-04T12:25:14.2572083Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml (deflated 28%)
2025-12-04T12:25:14.2573679Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml (deflated 86%)
2025-12-04T12:25:14.2575269Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml (deflated 86%)
2025-12-04T12:25:14.2576902Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml (deflated 77%)
2025-12-04T12:25:14.2578705Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml (deflated 77%)
2025-12-04T12:25:14.2580321Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml (deflated 77%)
2025-12-04T12:25:14.2581877Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml (deflated 77%)
2025-12-04T12:25:14.2583433Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml (deflated 77%)
2025-12-04T12:25:14.2584993Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml (deflated 86%)
2025-12-04T12:25:14.2586553Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml (deflated 77%)
2025-12-04T12:25:14.2588106Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml (deflated 90%)
2025-12-04T12:25:14.2589755Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml (deflated 90%)
2025-12-04T12:25:14.2591274Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml (deflated 90%)
2025-12-04T12:25:14.2592780Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml (deflated 28%)
2025-12-04T12:25:14.2594210Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml (deflated 77%)
2025-12-04T12:25:14.2595556Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml (deflated 86%)
2025-12-04T12:25:14.2596909Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml (deflated 86%)
2025-12-04T12:25:14.2598259Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml (deflated 77%)
2025-12-04T12:25:14.2599593Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml (deflated 86%)
2025-12-04T12:25:14.2600947Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml (deflated 77%)
2025-12-04T12:25:14.2602277Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml (deflated 78%)
2025-12-04T12:25:14.2603651Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml (deflated 78%)
2025-12-04T12:25:14.2605000Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml (deflated 78%)
2025-12-04T12:25:14.2606341Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml (deflated 77%)
2025-12-04T12:25:14.2607668Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml (deflated 77%)
2025-12-04T12:25:14.2609011Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml (deflated 77%)
2025-12-04T12:25:14.2610361Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml (deflated 86%)
2025-12-04T12:25:14.2611757Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml (deflated 77%)
2025-12-04T12:25:14.2613116Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml (deflated 77%)
2025-12-04T12:25:14.2614453Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml (deflated 86%)
2025-12-04T12:25:14.2615787Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml (deflated 77%)
2025-12-04T12:25:14.2617377Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml (deflated 77%)
2025-12-04T12:25:14.2618752Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml (deflated 77%)
2025-12-04T12:25:14.2620132Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml (deflated 77%)
2025-12-04T12:25:14.2621709Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml (deflated 77%)
2025-12-04T12:25:14.2623099Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml (deflated 86%)
2025-12-04T12:25:14.2624480Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml (deflated 77%)
2025-12-04T12:25:14.2625842Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml (deflated 90%)
2025-12-04T12:25:14.2627222Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml (deflated 77%)
2025-12-04T12:25:14.2628617Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml (deflated 86%)
2025-12-04T12:25:14.2629997Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml (deflated 77%)
2025-12-04T12:25:14.2631371Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml (deflated 86%)
2025-12-04T12:25:14.2632858Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml (deflated 77%)
2025-12-04T12:25:14.2634199Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml (deflated 90%)
2025-12-04T12:25:14.2635532Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml (deflated 90%)
2025-12-04T12:25:14.2636931Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml (deflated 77%)
2025-12-04T12:25:14.2638282Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml (deflated 90%)
2025-12-04T12:25:14.2639623Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml (deflated 77%)
2025-12-04T12:25:14.2640968Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml (deflated 77%)
2025-12-04T12:25:14.2642343Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml (deflated 77%)
2025-12-04T12:25:14.2643684Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml (deflated 77%)
2025-12-04T12:25:14.2645083Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml (deflated 77%)
2025-12-04T12:25:14.2646474Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml (deflated 77%)
2025-12-04T12:25:14.2647814Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml (deflated 77%)
2025-12-04T12:25:14.2649165Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml (deflated 77%)
2025-12-04T12:25:14.2650501Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml (deflated 86%)
2025-12-04T12:25:14.2651849Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml (deflated 77%)
2025-12-04T12:25:14.2653192Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml (deflated 77%)
2025-12-04T12:25:14.2654534Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml (deflated 77%)
2025-12-04T12:25:14.2655881Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml (deflated 77%)
2025-12-04T12:25:14.2657467Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml (deflated 77%)
2025-12-04T12:25:14.2658851Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml (deflated 77%)
2025-12-04T12:25:14.2660240Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml (deflated 78%)
2025-12-04T12:25:14.2661633Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml (deflated 78%)
2025-12-04T12:25:14.2663009Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml (deflated 90%)
2025-12-04T12:25:14.2664389Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml (deflated 78%)
2025-12-04T12:25:14.2665768Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml (deflated 78%)
2025-12-04T12:25:14.2667162Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml (deflated 90%)
2025-12-04T12:25:14.2668549Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml (deflated 86%)
2025-12-04T12:25:14.2670054Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml (deflated 90%)
2025-12-04T12:25:14.2671387Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml (deflated 78%)
2025-12-04T12:25:14.2672721Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml (deflated 77%)
2025-12-04T12:25:14.2674056Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml (deflated 77%)
2025-12-04T12:25:14.2675400Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml (deflated 77%)
2025-12-04T12:25:14.2676011Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml (deflated 86%)
2025-12-04T12:25:14.2676671Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml (deflated 86%)
2025-12-04T12:25:14.2677296Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml (deflated 77%)
2025-12-04T12:25:14.2677910Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml (deflated 86%)
2025-12-04T12:25:14.2678510Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml (deflated 77%)
2025-12-04T12:25:14.2679115Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml (deflated 86%)
2025-12-04T12:25:14.2679704Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml (deflated 77%)
2025-12-04T12:25:14.2680317Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml (deflated 86%)
2025-12-04T12:25:14.2680912Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml (deflated 77%)
2025-12-04T12:25:14.2681509Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml (deflated 77%)
2025-12-04T12:25:14.2682123Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml (deflated 77%)
2025-12-04T12:25:14.2682731Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml (deflated 77%)
2025-12-04T12:25:14.2683342Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml (deflated 77%)
2025-12-04T12:25:14.2683951Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml (deflated 77%)
2025-12-04T12:25:14.2684554Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml (deflated 77%)
2025-12-04T12:25:14.2685162Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml (deflated 77%)
2025-12-04T12:25:14.2685782Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml (deflated 77%)
2025-12-04T12:25:14.2686392Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml (deflated 77%)
2025-12-04T12:25:14.2687041Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml (deflated 77%)
2025-12-04T12:25:14.2687689Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml (deflated 86%)
2025-12-04T12:25:14.2688295Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml (deflated 77%)
2025-12-04T12:25:14.2688910Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml (deflated 28%)
2025-12-04T12:25:14.2704929Z   adding: test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.xml (deflated 79%)
2025-12-04T12:25:14.2705844Z   adding: test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.xml (deflated 88%)
2025-12-04T12:25:14.2706584Z   adding: test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.xml (deflated 78%)
2025-12-04T12:25:14.2707325Z   adding: test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.xml (deflated 47%)
2025-12-04T12:25:14.2707984Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.xml (deflated 35%)
2025-12-04T12:25:14.2708593Z   adding: test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.xml (deflated 66%)
2025-12-04T12:25:14.2709412Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.xml (deflated 70%)
2025-12-04T12:25:14.2710118Z   adding: test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.xml (deflated 58%)
2025-12-04T12:25:14.2710829Z   adding: test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.xml (deflated 70%)
2025-12-04T12:25:14.2711624Z   adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.xml (deflated 41%)
2025-12-04T12:25:14.2712538Z   adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.xml (deflated 67%)
2025-12-04T12:25:14.2713260Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.xml (deflated 59%)
2025-12-04T12:25:14.2713997Z   adding: test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.xml (deflated 84%)
2025-12-04T12:25:14.2714678Z   adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.xml (deflated 86%)
2025-12-04T12:25:14.2715404Z   adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.xml (deflated 82%)
2025-12-04T12:25:14.2715986Z   adding: test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.xml (deflated 82%)
2025-12-04T12:25:14.2716642Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.xml (deflated 54%)
2025-12-04T12:25:14.2717457Z   adding: test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.xml (deflated 62%)
2025-12-04T12:25:14.2718145Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.xml (deflated 75%)
2025-12-04T12:25:14.2718823Z   adding: test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.xml (deflated 86%)
2025-12-04T12:25:14.2719650Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.xml (deflated 27%)
2025-12-04T12:25:14.2720249Z   adding: test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.xml (deflated 51%)
2025-12-04T12:25:14.2721384Z   adding: test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.xml (deflated 75%)
2025-12-04T12:25:14.2722040Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.xml (deflated 52%)
2025-12-04T12:25:14.2722811Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.xml (deflated 55%)
2025-12-04T12:25:14.2723726Z   adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.xml (deflated 85%)
2025-12-04T12:25:14.2724565Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.xml (deflated 80%)
2025-12-04T12:25:14.2725216Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.xml (deflated 53%)
2025-12-04T12:25:14.2725912Z   adding: test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.xml (deflated 84%)
2025-12-04T12:25:14.2726668Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.xml (deflated 75%)
2025-12-04T12:25:14.2727342Z   adding: test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.xml (deflated 58%)
2025-12-04T12:25:14.2728046Z   adding: test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.xml (deflated 74%)
2025-12-04T12:25:14.2728601Z   adding: test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.xml (deflated 82%)
2025-12-04T12:25:14.2729370Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.xml (deflated 55%)
2025-12-04T12:25:14.2730077Z   adding: test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.xml (deflated 78%)
2025-12-04T12:25:14.2730984Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.xml (deflated 52%)
2025-12-04T12:25:14.2731706Z   adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.xml (deflated 66%)
2025-12-04T12:25:14.2732241Z   adding: test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.xml (deflated 89%)
2025-12-04T12:25:14.2733237Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.xml (deflated 37%)
2025-12-04T12:25:14.2733844Z   adding: test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.xml (deflated 84%)
2025-12-04T12:25:14.2734583Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.xml (deflated 39%)
2025-12-04T12:25:14.2735327Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.xml (deflated 77%)
2025-12-04T12:25:14.2735948Z   adding: test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.xml (deflated 38%)
2025-12-04T12:25:14.2736677Z   adding: test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.xml (deflated 39%)
2025-12-04T12:25:14.2737521Z   adding: test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.xml (deflated 40%)
2025-12-04T12:25:14.2738167Z   adding: test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.xml (deflated 82%)
2025-12-04T12:25:14.2738838Z   adding: test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.xml (deflated 38%)
2025-12-04T12:25:14.2739606Z   adding: test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.xml (deflated 38%)
2025-12-04T12:25:14.2740377Z   adding: test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.xml (deflated 85%)
2025-12-04T12:25:14.2741147Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.xml (deflated 70%)
2025-12-04T12:25:14.2741977Z   adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.xml (deflated 89%)
2025-12-04T12:25:14.2742648Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.xml (deflated 95%)
2025-12-04T12:25:14.2743290Z   adding: test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.xml (deflated 39%)
2025-12-04T12:25:14.2743926Z   adding: test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.xml (deflated 70%)
2025-12-04T12:25:14.2744657Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.xml (deflated 78%)
2025-12-04T12:25:14.2745482Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.xml (deflated 82%)
2025-12-04T12:25:14.2746249Z   adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.xml (deflated 90%)
2025-12-04T12:25:14.2747080Z   adding: test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.xml (deflated 62%)
2025-12-04T12:25:14.2747904Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.xml (deflated 54%)
2025-12-04T12:25:14.2748886Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.xml (deflated 77%)
2025-12-04T12:25:14.2749483Z   adding: test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.xml (deflated 82%)
2025-12-04T12:25:14.2750286Z   adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.xml (deflated 54%)
2025-12-04T12:25:14.2751012Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.xml (deflated 84%)
2025-12-04T12:25:14.2751759Z   adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.xml (deflated 75%)
2025-12-04T12:25:14.2752567Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.xml (deflated 55%)
2025-12-04T12:25:14.2753452Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.xml (deflated 59%)
2025-12-04T12:25:14.2754068Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.xml (deflated 35%)
2025-12-04T12:25:14.2754671Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.xml (deflated 35%)
2025-12-04T12:25:14.2755327Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.xml (deflated 35%)
2025-12-04T12:25:14.2755963Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.xml (deflated 36%)
2025-12-04T12:25:14.2756566Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.xml (deflated 36%)
2025-12-04T12:25:14.2757178Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.xml (deflated 36%)
2025-12-04T12:25:14.2757775Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.xml (deflated 36%)
2025-12-04T12:25:14.2758385Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.xml (deflated 35%)
2025-12-04T12:25:14.2758993Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.xml (deflated 35%)
2025-12-04T12:25:14.2759594Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.xml (deflated 35%)
2025-12-04T12:25:14.2760189Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.xml (deflated 45%)
2025-12-04T12:25:14.2760776Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.xml (deflated 46%)
2025-12-04T12:25:14.2761359Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.xml (deflated 46%)
2025-12-04T12:25:14.2761956Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.xml (deflated 45%)
2025-12-04T12:25:14.2762544Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.xml (deflated 45%)
2025-12-04T12:25:14.2763141Z   adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.xml (deflated 45%)
2025-12-04T12:25:14.2763686Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.xml (deflated 33%)
2025-12-04T12:25:14.2764249Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.xml (deflated 37%)
2025-12-04T12:25:14.2764790Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.xml (deflated 35%)
2025-12-04T12:25:14.2765365Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.xml (deflated 35%)
2025-12-04T12:25:14.2765923Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.xml (deflated 35%)
2025-12-04T12:25:14.2766466Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.xml (deflated 35%)
2025-12-04T12:25:14.2767019Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.xml (deflated 35%)
2025-12-04T12:25:14.2767561Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.xml (deflated 35%)
2025-12-04T12:25:14.2768105Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.xml (deflated 35%)
2025-12-04T12:25:14.2768659Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.xml (deflated 35%)
2025-12-04T12:25:14.2769260Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.xml (deflated 35%)
2025-12-04T12:25:14.2769841Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.xml (deflated 35%)
2025-12-04T12:25:14.2770383Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.xml (deflated 35%)
2025-12-04T12:25:14.2770920Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.xml (deflated 36%)
2025-12-04T12:25:14.2771476Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.xml (deflated 35%)
2025-12-04T12:25:14.2772018Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.xml (deflated 35%)
2025-12-04T12:25:14.2772577Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.xml (deflated 35%)
2025-12-04T12:25:14.2773124Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.xml (deflated 36%)
2025-12-04T12:25:14.2773666Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.xml (deflated 35%)
2025-12-04T12:25:14.2774217Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.xml (deflated 35%)
2025-12-04T12:25:14.2774753Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.xml (deflated 35%)
2025-12-04T12:25:14.2775299Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.xml (deflated 35%)
2025-12-04T12:25:14.2775841Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.xml (deflated 35%)
2025-12-04T12:25:14.2776474Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.xml (deflated 35%)
2025-12-04T12:25:14.2777205Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.xml (deflated 36%)
2025-12-04T12:25:14.2777763Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.xml (deflated 36%)
2025-12-04T12:25:14.2778334Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.xml (deflated 36%)
2025-12-04T12:25:14.2778893Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.xml (deflated 36%)
2025-12-04T12:25:14.2779464Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.xml (deflated 35%)
2025-12-04T12:25:14.2780067Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.xml (deflated 36%)
2025-12-04T12:25:14.2780630Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.xml (deflated 35%)
2025-12-04T12:25:14.2781190Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.xml (deflated 35%)
2025-12-04T12:25:14.2781746Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.xml (deflated 37%)
2025-12-04T12:25:14.2782313Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.xml (deflated 36%)
2025-12-04T12:25:14.2782872Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.xml (deflated 36%)
2025-12-04T12:25:14.2783434Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.xml (deflated 36%)
2025-12-04T12:25:14.2784053Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.xml (deflated 36%)
2025-12-04T12:25:14.2784641Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.xml (deflated 36%)
2025-12-04T12:25:14.2785205Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.xml (deflated 36%)
2025-12-04T12:25:14.2785761Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.xml (deflated 36%)
2025-12-04T12:25:14.2786325Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.xml (deflated 36%)
2025-12-04T12:25:14.2786885Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.xml (deflated 36%)
2025-12-04T12:25:14.2787449Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.xml (deflated 36%)
2025-12-04T12:25:14.2788010Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.xml (deflated 37%)
2025-12-04T12:25:14.2788563Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.xml (deflated 36%)
2025-12-04T12:25:14.2789218Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.xml (deflated 37%)
2025-12-04T12:25:14.2789767Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.xml (deflated 37%)
2025-12-04T12:25:14.2790313Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.xml (deflated 36%)
2025-12-04T12:25:14.2790874Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.xml (deflated 36%)
2025-12-04T12:25:14.2791416Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.xml (deflated 36%)
2025-12-04T12:25:14.2791966Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.xml (deflated 36%)
2025-12-04T12:25:14.2792511Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.xml (deflated 37%)
2025-12-04T12:25:14.2793057Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.xml (deflated 36%)
2025-12-04T12:25:14.2793600Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.xml (deflated 36%)
2025-12-04T12:25:14.2794171Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.xml (deflated 36%)
2025-12-04T12:25:14.2794722Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.xml (deflated 36%)
2025-12-04T12:25:14.2795279Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.xml (deflated 37%)
2025-12-04T12:25:14.2795825Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.xml (deflated 33%)
2025-12-04T12:25:14.2796374Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.xml (deflated 34%)
2025-12-04T12:25:14.2796916Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.xml (deflated 33%)
2025-12-04T12:25:14.2797469Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.xml (deflated 34%)
2025-12-04T12:25:14.2798079Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.xml (deflated 34%)
2025-12-04T12:25:14.2798644Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.xml (deflated 34%)
2025-12-04T12:25:14.2799194Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.xml (deflated 34%)
2025-12-04T12:25:14.2799736Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.xml (deflated 34%)
2025-12-04T12:25:14.2800288Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.xml (deflated 34%)
2025-12-04T12:25:14.2800831Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.xml (deflated 34%)
2025-12-04T12:25:14.2801379Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.xml (deflated 35%)
2025-12-04T12:25:14.2801937Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.xml (deflated 34%)
2025-12-04T12:25:14.2802479Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.xml (deflated 34%)
2025-12-04T12:25:14.2803033Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.xml (deflated 34%)
2025-12-04T12:25:14.2803577Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.xml (deflated 34%)
2025-12-04T12:25:14.2804119Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.xml (deflated 34%)
2025-12-04T12:25:14.2804679Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.xml (deflated 34%)
2025-12-04T12:25:14.2805228Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.xml (deflated 34%)
2025-12-04T12:25:14.2805786Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.xml (deflated 34%)
2025-12-04T12:25:14.2806324Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.xml (deflated 34%)
2025-12-04T12:25:14.2806871Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.xml (deflated 34%)
2025-12-04T12:25:14.2807420Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.xml (deflated 34%)
2025-12-04T12:25:14.2807963Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.xml (deflated 34%)
2025-12-04T12:25:14.2808548Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.xml (deflated 34%)
2025-12-04T12:25:14.2809092Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.xml (deflated 34%)
2025-12-04T12:25:14.2809637Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.xml (deflated 34%)
2025-12-04T12:25:14.2810195Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.xml (deflated 34%)
2025-12-04T12:25:14.2810736Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.xml (deflated 34%)
2025-12-04T12:25:14.2811284Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.xml (deflated 35%)
2025-12-04T12:25:14.2811823Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.xml (deflated 34%)
2025-12-04T12:25:14.2812418Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.xml (deflated 34%)
2025-12-04T12:25:14.2812999Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.xml (deflated 34%)
2025-12-04T12:25:14.2813543Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.xml (deflated 35%)
2025-12-04T12:25:14.2814095Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.xml (deflated 34%)
2025-12-04T12:25:14.2814638Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.xml (deflated 35%)
2025-12-04T12:25:14.2815187Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.xml (deflated 35%)
2025-12-04T12:25:14.2815732Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.xml (deflated 34%)
2025-12-04T12:25:14.2816277Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.xml (deflated 34%)
2025-12-04T12:25:14.2817070Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.xml (deflated 35%)
2025-12-04T12:25:14.2817666Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.xml (deflated 35%)
2025-12-04T12:25:14.2818234Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.xml (deflated 35%)
2025-12-04T12:25:14.2818791Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.xml (deflated 35%)
2025-12-04T12:25:14.2819353Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.xml (deflated 35%)
2025-12-04T12:25:14.2819921Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.xml (deflated 35%)
2025-12-04T12:25:14.2820484Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.xml (deflated 34%)
2025-12-04T12:25:14.2821233Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.xml (deflated 35%)
2025-12-04T12:25:14.2821793Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.xml (deflated 35%)
2025-12-04T12:25:14.2822359Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.xml (deflated 35%)
2025-12-04T12:25:14.2823070Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.xml (deflated 35%)
2025-12-04T12:25:14.2823631Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.xml (deflated 35%)
2025-12-04T12:25:14.2824201Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.xml (deflated 35%)
2025-12-04T12:25:14.2824754Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.xml (deflated 34%)
2025-12-04T12:25:14.2825311Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.xml (deflated 35%)
2025-12-04T12:25:14.2825869Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.xml (deflated 35%)
2025-12-04T12:25:14.2826440Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.xml (deflated 47%)
2025-12-04T12:25:14.2827082Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.xml (deflated 35%)
2025-12-04T12:25:14.2827678Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.xml (deflated 34%)
2025-12-04T12:25:14.2828250Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.xml (deflated 35%)
2025-12-04T12:25:14.2828817Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.xml (deflated 34%)
2025-12-04T12:25:14.2829377Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.xml (deflated 35%)
2025-12-04T12:25:14.2829948Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.xml (deflated 35%)
2025-12-04T12:25:14.2830514Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.xml (deflated 36%)
2025-12-04T12:25:14.2831091Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.xml (deflated 36%)
2025-12-04T12:25:14.2831652Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.xml (deflated 36%)
2025-12-04T12:25:14.2832208Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.xml (deflated 36%)
2025-12-04T12:25:14.2832887Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.xml (deflated 35%)
2025-12-04T12:25:14.2833432Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.xml (deflated 35%)
2025-12-04T12:25:14.2833981Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.xml (deflated 33%)
2025-12-04T12:25:14.2834535Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.xml (deflated 34%)
2025-12-04T12:25:14.2835081Z   adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.xml (deflated 34%)
2025-12-04T12:25:14.2835904Z   adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.xml (deflated 91%)
2025-12-04T12:25:14.2836449Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.xml (deflated 33%)
2025-12-04T12:25:14.2836999Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.xml (deflated 34%)
2025-12-04T12:25:14.2837576Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.xml (deflated 35%)
2025-12-04T12:25:14.2838122Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.xml (deflated 35%)
2025-12-04T12:25:14.2838677Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.xml (deflated 34%)
2025-12-04T12:25:14.2839218Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.xml (deflated 49%)
2025-12-04T12:25:14.2839772Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.xml (deflated 34%)
2025-12-04T12:25:14.2840317Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.xml (deflated 34%)
2025-12-04T12:25:14.2840877Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.xml (deflated 36%)
2025-12-04T12:25:14.2841473Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.xml (deflated 34%)
2025-12-04T12:25:14.2842055Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.xml (deflated 34%)
2025-12-04T12:25:14.2842611Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.xml (deflated 36%)
2025-12-04T12:25:14.2843158Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.xml (deflated 34%)
2025-12-04T12:25:14.2843718Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.xml (deflated 35%)
2025-12-04T12:25:14.2844261Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.xml (deflated 34%)
2025-12-04T12:25:14.2844804Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.xml (deflated 34%)
2025-12-04T12:25:14.2845363Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.xml (deflated 34%)
2025-12-04T12:25:14.2845905Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.xml (deflated 36%)
2025-12-04T12:25:14.2846459Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.xml (deflated 36%)
2025-12-04T12:25:14.2847012Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.xml (deflated 36%)
2025-12-04T12:25:14.2847559Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.xml (deflated 37%)
2025-12-04T12:25:14.2848119Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.xml (deflated 36%)
2025-12-04T12:25:14.2848665Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.xml (deflated 35%)
2025-12-04T12:25:14.2849232Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.xml (deflated 36%)
2025-12-04T12:25:14.2849776Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.xml (deflated 35%)
2025-12-04T12:25:14.2850314Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.xml (deflated 36%)
2025-12-04T12:25:14.2850870Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.xml (deflated 36%)
2025-12-04T12:25:14.2851408Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.xml (deflated 36%)
2025-12-04T12:25:14.2852013Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.xml (deflated 38%)
2025-12-04T12:25:14.2852553Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.xml (deflated 36%)
2025-12-04T12:25:14.2853092Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.xml (deflated 36%)
2025-12-04T12:25:14.2853639Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.xml (deflated 37%)
2025-12-04T12:25:14.2854178Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.xml (deflated 35%)
2025-12-04T12:25:14.2854735Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.xml (deflated 36%)
2025-12-04T12:25:14.2855275Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.xml (deflated 37%)
2025-12-04T12:25:14.2855874Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.xml (deflated 36%)
2025-12-04T12:25:14.2856527Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.xml (deflated 36%)
2025-12-04T12:25:14.2857248Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.xml (deflated 36%)
2025-12-04T12:25:14.2857826Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.xml (deflated 35%)
2025-12-04T12:25:14.2858387Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.xml (deflated 37%)
2025-12-04T12:25:14.2858964Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.xml (deflated 35%)
2025-12-04T12:25:14.2859524Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.xml (deflated 45%)
2025-12-04T12:25:14.2860080Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.xml (deflated 49%)
2025-12-04T12:25:14.2860652Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.xml (deflated 35%)
2025-12-04T12:25:14.2861216Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.xml (deflated 35%)
2025-12-04T12:25:14.2861789Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.xml (deflated 36%)
2025-12-04T12:25:14.2862352Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.xml (deflated 35%)
2025-12-04T12:25:14.2862920Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.xml (deflated 35%)
2025-12-04T12:25:14.2863499Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.xml (deflated 37%)
2025-12-04T12:25:14.2864064Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.xml (deflated 33%)
2025-12-04T12:25:14.2864634Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.xml (deflated 48%)
2025-12-04T12:25:14.2865192Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.xml (deflated 33%)
2025-12-04T12:25:14.2865753Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.xml (deflated 34%)
2025-12-04T12:25:14.2866358Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.xml (deflated 35%)
2025-12-04T12:25:14.2867026Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.xml (deflated 35%)
2025-12-04T12:25:14.2867595Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.xml (deflated 34%)
2025-12-04T12:25:14.2868156Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.xml (deflated 34%)
2025-12-04T12:25:14.2868716Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.xml (deflated 36%)
2025-12-04T12:25:14.2869383Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.xml (deflated 35%)
2025-12-04T12:25:14.2869932Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.xml (deflated 34%)
2025-12-04T12:25:14.2870699Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.xml (deflated 35%)
2025-12-04T12:25:14.2871299Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.xml (deflated 34%)
2025-12-04T12:25:14.2871847Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.xml (deflated 34%)
2025-12-04T12:25:14.2872396Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.xml (deflated 35%)
2025-12-04T12:25:14.2872974Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.xml (deflated 35%)
2025-12-04T12:25:14.2873540Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.xml (deflated 35%)
2025-12-04T12:25:14.2874092Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.xml (deflated 33%)
2025-12-04T12:25:14.2874652Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.xml (deflated 35%)
2025-12-04T12:25:14.2875198Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.xml (deflated 35%)
2025-12-04T12:25:14.2875740Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.xml (deflated 35%)
2025-12-04T12:25:14.2876312Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.xml (deflated 34%)
2025-12-04T12:25:14.2876861Z   adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.xml (deflated 34%)
2025-12-04T12:25:14.2877527Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.xml (deflated 28%)
2025-12-04T12:25:14.2878184Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.xml (deflated 28%)
2025-12-04T12:25:14.2878848Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.xml (deflated 28%)
2025-12-04T12:25:14.2879509Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.xml (deflated 28%)
2025-12-04T12:25:14.2880161Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.xml (deflated 27%)
2025-12-04T12:25:14.2880817Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.xml (deflated 28%)
2025-12-04T12:25:14.2881504Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.xml (deflated 27%)
2025-12-04T12:25:14.2882157Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.xml (deflated 28%)
2025-12-04T12:25:14.2882811Z   adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.xml (deflated 28%)
2025-12-04T12:25:14.2883465Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.xml (deflated 28%)
2025-12-04T12:25:14.2884125Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.xml (deflated 28%)
2025-12-04T12:25:14.2884781Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.xml (deflated 28%)
2025-12-04T12:25:14.2885494Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.xml (deflated 28%)
2025-12-04T12:25:14.2886172Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.xml (deflated 28%)
2025-12-04T12:25:14.2886822Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.xml (deflated 28%)
2025-12-04T12:25:14.2887482Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.xml (deflated 28%)
2025-12-04T12:25:14.2888134Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.xml (deflated 28%)
2025-12-04T12:25:14.2888800Z   adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.xml (deflated 28%)
2025-12-04T12:25:14.2889459Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.xml (deflated 35%)
2025-12-04T12:25:14.2890115Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.xml (deflated 44%)
2025-12-04T12:25:14.2890778Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.xml (deflated 36%)
2025-12-04T12:25:14.2891427Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.xml (deflated 36%)
2025-12-04T12:25:14.2892086Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.xml (deflated 36%)
2025-12-04T12:25:14.2892748Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.xml (deflated 35%)
2025-12-04T12:25:14.2893411Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.xml (deflated 44%)
2025-12-04T12:25:14.2894068Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.xml (deflated 44%)
2025-12-04T12:25:14.2894714Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.xml (deflated 43%)
2025-12-04T12:25:14.2895370Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.xml (deflated 43%)
2025-12-04T12:25:14.2896050Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.xml (deflated 43%)
2025-12-04T12:25:14.2896965Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.xml (deflated 43%)
2025-12-04T12:25:14.2897637Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.xml (deflated 43%)
2025-12-04T12:25:14.2898305Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.xml (deflated 36%)
2025-12-04T12:25:14.2898990Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.xml (deflated 45%)
2025-12-04T12:25:14.2899666Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.xml (deflated 36%)
2025-12-04T12:25:14.2900417Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.xml (deflated 44%)
2025-12-04T12:25:14.2901134Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.xml (deflated 44%)
2025-12-04T12:25:14.2901814Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.xml (deflated 41%)
2025-12-04T12:25:14.2902617Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.xml (deflated 35%)
2025-12-04T12:25:14.2903317Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.xml (deflated 35%)
2025-12-04T12:25:14.2903995Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.xml (deflated 43%)
2025-12-04T12:25:14.2904674Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.xml (deflated 36%)
2025-12-04T12:25:14.2905351Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.xml (deflated 56%)
2025-12-04T12:25:14.2906026Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.xml (deflated 44%)
2025-12-04T12:25:14.2906699Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.xml (deflated 43%)
2025-12-04T12:25:14.2907378Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.xml (deflated 43%)
2025-12-04T12:25:14.2908052Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.xml (deflated 44%)
2025-12-04T12:25:14.2908853Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.xml (deflated 57%)
2025-12-04T12:25:14.2909503Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.xml (deflated 36%)
2025-12-04T12:25:14.2910168Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.xml (deflated 36%)
2025-12-04T12:25:14.2910814Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.xml (deflated 43%)
2025-12-04T12:25:14.2911495Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.xml (deflated 36%)
2025-12-04T12:25:14.2912192Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.xml (deflated 43%)
2025-12-04T12:25:14.2912847Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.xml (deflated 35%)
2025-12-04T12:25:14.2913504Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.xml (deflated 38%)
2025-12-04T12:25:14.2914152Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.xml (deflated 36%)
2025-12-04T12:25:14.2914804Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.xml (deflated 35%)
2025-12-04T12:25:14.2915468Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.xml (deflated 36%)
2025-12-04T12:25:14.2916179Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.xml (deflated 35%)
2025-12-04T12:25:14.2916869Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.xml (deflated 44%)
2025-12-04T12:25:14.2917522Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.xml (deflated 44%)
2025-12-04T12:25:14.2918180Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.xml (deflated 44%)
2025-12-04T12:25:14.2918831Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.xml (deflated 44%)
2025-12-04T12:25:14.2919481Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.xml (deflated 43%)
2025-12-04T12:25:14.2920147Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.xml (deflated 43%)
2025-12-04T12:25:14.2920941Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.xml (deflated 43%)
2025-12-04T12:25:14.2921779Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.xml (deflated 44%)
2025-12-04T12:25:14.2922452Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.xml (deflated 43%)
2025-12-04T12:25:14.2923128Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.xml (deflated 43%)
2025-12-04T12:25:14.2923819Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.xml (deflated 36%)
2025-12-04T12:25:14.2924497Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.xml (deflated 43%)
2025-12-04T12:25:14.2925179Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.xml (deflated 36%)
2025-12-04T12:25:14.2925852Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.xml (deflated 36%)
2025-12-04T12:25:14.2926524Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.xml (deflated 42%)
2025-12-04T12:25:14.2927262Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.xml (deflated 36%)
2025-12-04T12:25:14.2927940Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.xml (deflated 42%)
2025-12-04T12:25:14.2928630Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.xml (deflated 36%)
2025-12-04T12:25:14.2929295Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.xml (deflated 36%)
2025-12-04T12:25:14.2929983Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.xml (deflated 36%)
2025-12-04T12:25:14.2930646Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.xml (deflated 35%)
2025-12-04T12:25:14.2931399Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.xml (deflated 37%)
2025-12-04T12:25:14.2932122Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.xml (deflated 44%)
2025-12-04T12:25:14.2932796Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.xml (deflated 43%)
2025-12-04T12:25:14.2933586Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.xml (deflated 43%)
2025-12-04T12:25:14.2934337Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.xml (deflated 36%)
2025-12-04T12:25:14.2934980Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.xml (deflated 37%)
2025-12-04T12:25:14.2935618Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.xml (deflated 36%)
2025-12-04T12:25:14.2936252Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.xml (deflated 36%)
2025-12-04T12:25:14.2937133Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.xml (deflated 35%)
2025-12-04T12:25:14.2937802Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.xml (deflated 35%)
2025-12-04T12:25:14.2938486Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.xml (deflated 44%)
2025-12-04T12:25:14.2939167Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.xml (deflated 44%)
2025-12-04T12:25:14.2939836Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.xml (deflated 43%)
2025-12-04T12:25:14.2940523Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.xml (deflated 36%)
2025-12-04T12:25:14.2941196Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.xml (deflated 36%)
2025-12-04T12:25:14.2941884Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.xml (deflated 44%)
2025-12-04T12:25:14.2942559Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.xml (deflated 36%)
2025-12-04T12:25:14.2943274Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.xml (deflated 44%)
2025-12-04T12:25:14.2943962Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.xml (deflated 36%)
2025-12-04T12:25:14.2944636Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.xml (deflated 47%)
2025-12-04T12:25:14.2945325Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.xml (deflated 35%)
2025-12-04T12:25:14.2945996Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.xml (deflated 56%)
2025-12-04T12:25:14.2946681Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.xml (deflated 35%)
2025-12-04T12:25:14.2947406Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.xml (deflated 35%)
2025-12-04T12:25:14.2948104Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.xml (deflated 43%)
2025-12-04T12:25:14.2948898Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.xml (deflated 36%)
2025-12-04T12:25:14.2949613Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.xml (deflated 35%)
2025-12-04T12:25:14.2950225Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.xml (deflated 35%)
2025-12-04T12:25:14.2950822Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.xml (deflated 43%)
2025-12-04T12:25:14.2951426Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.xml (deflated 44%)
2025-12-04T12:25:14.2952037Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.xml (deflated 36%)
2025-12-04T12:25:14.2952635Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.xml (deflated 43%)
2025-12-04T12:25:14.2953242Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.xml (deflated 36%)
2025-12-04T12:25:14.2953845Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.xml (deflated 35%)
2025-12-04T12:25:14.2954458Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.xml (deflated 36%)
2025-12-04T12:25:14.2955054Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.xml (deflated 46%)
2025-12-04T12:25:14.2955653Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.xml (deflated 45%)
2025-12-04T12:25:14.2956256Z   adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.xml (deflated 36%)
2025-12-04T12:25:14.2956860Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.xml (deflated 35%)
2025-12-04T12:25:14.2957502Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.xml (deflated 44%)
2025-12-04T12:25:14.2958110Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.xml (deflated 35%)
2025-12-04T12:25:14.2958715Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.xml (deflated 36%)
2025-12-04T12:25:14.2959330Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.xml (deflated 36%)
2025-12-04T12:25:14.2959929Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.xml (deflated 36%)
2025-12-04T12:25:14.2960540Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.xml (deflated 44%)
2025-12-04T12:25:14.2961201Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.xml (deflated 44%)
2025-12-04T12:25:14.2961845Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.xml (deflated 43%)
2025-12-04T12:25:14.2962453Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.xml (deflated 42%)
2025-12-04T12:25:14.2963053Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.xml (deflated 43%)
2025-12-04T12:25:14.2963664Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.xml (deflated 43%)
2025-12-04T12:25:14.2964262Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.xml (deflated 43%)
2025-12-04T12:25:14.2964876Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.xml (deflated 36%)
2025-12-04T12:25:14.2965483Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.xml (deflated 44%)
2025-12-04T12:25:14.2966084Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.xml (deflated 36%)
2025-12-04T12:25:14.2966695Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.xml (deflated 44%)
2025-12-04T12:25:14.2967299Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.xml (deflated 44%)
2025-12-04T12:25:14.2967911Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.xml (deflated 41%)
2025-12-04T12:25:14.2968519Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.xml (deflated 35%)
2025-12-04T12:25:14.2969136Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.xml (deflated 35%)
2025-12-04T12:25:14.2969739Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.xml (deflated 43%)
2025-12-04T12:25:14.2970343Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.xml (deflated 35%)
2025-12-04T12:25:14.2970956Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.xml (deflated 56%)
2025-12-04T12:25:14.2971595Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.xml (deflated 44%)
2025-12-04T12:25:14.2972206Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.xml (deflated 43%)
2025-12-04T12:25:14.2972811Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.xml (deflated 43%)
2025-12-04T12:25:14.2973421Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.xml (deflated 44%)
2025-12-04T12:25:14.2974031Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.xml (deflated 57%)
2025-12-04T12:25:14.2974643Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.xml (deflated 36%)
2025-12-04T12:25:14.2975311Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.xml (deflated 36%)
2025-12-04T12:25:14.2975937Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.xml (deflated 43%)
2025-12-04T12:25:14.2976607Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.xml (deflated 36%)
2025-12-04T12:25:14.2977444Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.xml (deflated 43%)
2025-12-04T12:25:14.2978128Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.xml (deflated 35%)
2025-12-04T12:25:14.2978824Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.xml (deflated 37%)
2025-12-04T12:25:14.2979504Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.xml (deflated 36%)
2025-12-04T12:25:14.2980196Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.xml (deflated 35%)
2025-12-04T12:25:14.2980874Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.xml (deflated 36%)
2025-12-04T12:25:14.2981571Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.xml (deflated 35%)
2025-12-04T12:25:14.2982249Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.xml (deflated 44%)
2025-12-04T12:25:14.2982934Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.xml (deflated 44%)
2025-12-04T12:25:14.2983624Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.xml (deflated 44%)
2025-12-04T12:25:14.2984304Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.xml (deflated 44%)
2025-12-04T12:25:14.2984990Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.xml (deflated 43%)
2025-12-04T12:25:14.2985669Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.xml (deflated 43%)
2025-12-04T12:25:14.2986390Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.xml (deflated 43%)
2025-12-04T12:25:14.2987084Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.xml (deflated 43%)
2025-12-04T12:25:14.2987768Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.xml (deflated 43%)
2025-12-04T12:25:14.2988467Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.xml (deflated 43%)
2025-12-04T12:25:14.2989234Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.xml (deflated 36%)
2025-12-04T12:25:14.2989850Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.xml (deflated 43%)
2025-12-04T12:25:14.2990503Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.xml (deflated 35%)
2025-12-04T12:25:14.2991130Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.xml (deflated 35%)
2025-12-04T12:25:14.2991749Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.xml (deflated 42%)
2025-12-04T12:25:14.2992349Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.xml (deflated 36%)
2025-12-04T12:25:14.2992961Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.xml (deflated 43%)
2025-12-04T12:25:14.2993563Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.xml (deflated 35%)
2025-12-04T12:25:14.2994165Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.xml (deflated 36%)
2025-12-04T12:25:14.2994783Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.xml (deflated 36%)
2025-12-04T12:25:14.2995380Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.xml (deflated 35%)
2025-12-04T12:25:14.2995994Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.xml (deflated 36%)
2025-12-04T12:25:14.2996600Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.xml (deflated 44%)
2025-12-04T12:25:14.2997218Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.xml (deflated 43%)
2025-12-04T12:25:14.2997824Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.xml (deflated 43%)
2025-12-04T12:25:14.2998440Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.xml (deflated 36%)
2025-12-04T12:25:14.2999052Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.xml (deflated 37%)
2025-12-04T12:25:14.2999654Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.xml (deflated 36%)
2025-12-04T12:25:14.3000293Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.xml (deflated 36%)
2025-12-04T12:25:14.3000900Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.xml (deflated 36%)
2025-12-04T12:25:14.3001502Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.xml (deflated 35%)
2025-12-04T12:25:14.3002115Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.xml (deflated 44%)
2025-12-04T12:25:14.3002717Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.xml (deflated 44%)
2025-12-04T12:25:14.3003333Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.xml (deflated 43%)
2025-12-04T12:25:14.3003941Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.xml (deflated 36%)
2025-12-04T12:25:14.3004635Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.xml (deflated 36%)
2025-12-04T12:25:14.3005262Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.xml (deflated 44%)
2025-12-04T12:25:14.3005869Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.xml (deflated 36%)
2025-12-04T12:25:14.3006488Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.xml (deflated 45%)
2025-12-04T12:25:14.3007094Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.xml (deflated 36%)
2025-12-04T12:25:14.3007710Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.xml (deflated 47%)
2025-12-04T12:25:14.3008311Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.xml (deflated 36%)
2025-12-04T12:25:14.3008923Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.xml (deflated 56%)
2025-12-04T12:25:14.3009539Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.xml (deflated 36%)
2025-12-04T12:25:14.3010143Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.xml (deflated 36%)
2025-12-04T12:25:14.3010753Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.xml (deflated 43%)
2025-12-04T12:25:14.3011357Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.xml (deflated 36%)
2025-12-04T12:25:14.3011971Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.xml (deflated 36%)
2025-12-04T12:25:14.3012572Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.xml (deflated 35%)
2025-12-04T12:25:14.3013177Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.xml (deflated 43%)
2025-12-04T12:25:14.3013791Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.xml (deflated 45%)
2025-12-04T12:25:14.3014427Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.xml (deflated 37%)
2025-12-04T12:25:14.3015050Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.xml (deflated 43%)
2025-12-04T12:25:14.3015647Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.xml (deflated 36%)
2025-12-04T12:25:14.3016246Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.xml (deflated 35%)
2025-12-04T12:25:14.3017114Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.xml (deflated 36%)
2025-12-04T12:25:14.3017799Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.xml (deflated 46%)
2025-12-04T12:25:14.3018564Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.xml (deflated 45%)
2025-12-04T12:25:14.3019272Z   adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.xml (deflated 36%)
2025-12-04T12:25:14.3019959Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.xml (deflated 35%)
2025-12-04T12:25:14.3020631Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.xml (deflated 37%)
2025-12-04T12:25:14.3021487Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.xml (deflated 35%)
2025-12-04T12:25:14.3022183Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.xml (deflated 36%)
2025-12-04T12:25:14.3022859Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.xml (deflated 45%)
2025-12-04T12:25:14.3023537Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.xml (deflated 35%)
2025-12-04T12:25:14.3024206Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.xml (deflated 43%)
2025-12-04T12:25:14.3024885Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.xml (deflated 43%)
2025-12-04T12:25:14.3025566Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.xml (deflated 36%)
2025-12-04T12:25:14.3026247Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.xml (deflated 35%)
2025-12-04T12:25:14.3026927Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.xml (deflated 36%)
2025-12-04T12:25:14.3027600Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.xml (deflated 36%)
2025-12-04T12:25:14.3028280Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.xml (deflated 35%)
2025-12-04T12:25:14.3028954Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.xml (deflated 36%)
2025-12-04T12:25:14.3029693Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.xml (deflated 45%)
2025-12-04T12:25:14.3030365Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.xml (deflated 44%)
2025-12-04T12:25:14.3031040Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.xml (deflated 44%)
2025-12-04T12:25:14.3031707Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.xml (deflated 44%)
2025-12-04T12:25:14.3032372Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.xml (deflated 35%)
2025-12-04T12:25:14.3033253Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.xml (deflated 35%)
2025-12-04T12:25:14.3033917Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.xml (deflated 35%)
2025-12-04T12:25:14.3034550Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.xml (deflated 35%)
2025-12-04T12:25:14.3035150Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.xml (deflated 36%)
2025-12-04T12:25:14.3035743Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.xml (deflated 43%)
2025-12-04T12:25:14.3036346Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.xml (deflated 44%)
2025-12-04T12:25:14.3036944Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.xml (deflated 35%)
2025-12-04T12:25:14.3037552Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.xml (deflated 36%)
2025-12-04T12:25:14.3038145Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.xml (deflated 36%)
2025-12-04T12:25:14.3038751Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.xml (deflated 57%)
2025-12-04T12:25:14.3039343Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.xml (deflated 36%)
2025-12-04T12:25:14.3039937Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.xml (deflated 36%)
2025-12-04T12:25:14.3040543Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.xml (deflated 36%)
2025-12-04T12:25:14.3041147Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.xml (deflated 42%)
2025-12-04T12:25:14.3041748Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.xml (deflated 35%)
2025-12-04T12:25:14.3042340Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.xml (deflated 35%)
2025-12-04T12:25:14.3042935Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.xml (deflated 37%)
2025-12-04T12:25:14.3043540Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.xml (deflated 35%)
2025-12-04T12:25:14.3044161Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.xml (deflated 35%)
2025-12-04T12:25:14.3044765Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.xml (deflated 36%)
2025-12-04T12:25:14.3045361Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.xml (deflated 35%)
2025-12-04T12:25:14.3045965Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.xml (deflated 35%)
2025-12-04T12:25:14.3046561Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.xml (deflated 35%)
2025-12-04T12:25:14.3047160Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.xml (deflated 36%)
2025-12-04T12:25:14.3048012Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.xml (deflated 36%)
2025-12-04T12:25:14.3048742Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.xml (deflated 43%)
2025-12-04T12:25:14.3049570Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.xml (deflated 36%)
2025-12-04T12:25:14.3050262Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.xml (deflated 36%)
2025-12-04T12:25:14.3050933Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.xml (deflated 44%)
2025-12-04T12:25:14.3051608Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.xml (deflated 44%)
2025-12-04T12:25:14.3052281Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.xml (deflated 43%)
2025-12-04T12:25:14.3052939Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.xml (deflated 42%)
2025-12-04T12:25:14.3053590Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.xml (deflated 36%)
2025-12-04T12:25:14.3054244Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.xml (deflated 47%)
2025-12-04T12:25:14.3054893Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.xml (deflated 35%)
2025-12-04T12:25:14.3055548Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.xml (deflated 35%)
2025-12-04T12:25:14.3056205Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.xml (deflated 36%)
2025-12-04T12:25:14.3057115Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.xml (deflated 43%)
2025-12-04T12:25:14.3057783Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.xml (deflated 35%)
2025-12-04T12:25:14.3058457Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.xml (deflated 36%)
2025-12-04T12:25:14.3059176Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.xml (deflated 36%)
2025-12-04T12:25:14.3059850Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.xml (deflated 35%)
2025-12-04T12:25:14.3060540Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.xml (deflated 37%)
2025-12-04T12:25:14.3061217Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.xml (deflated 36%)
2025-12-04T12:25:14.3061899Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.xml (deflated 36%)
2025-12-04T12:25:14.3062571Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.xml (deflated 43%)
2025-12-04T12:25:14.3063303Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.xml (deflated 43%)
2025-12-04T12:25:14.3064011Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.xml (deflated 37%)
2025-12-04T12:25:14.3064678Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.xml (deflated 36%)
2025-12-04T12:25:14.3065360Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.xml (deflated 36%)
2025-12-04T12:25:14.3066034Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.xml (deflated 35%)
2025-12-04T12:25:14.3066702Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.xml (deflated 43%)
2025-12-04T12:25:14.3067388Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.xml (deflated 36%)
2025-12-04T12:25:14.3068057Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.xml (deflated 36%)
2025-12-04T12:25:14.3068937Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.xml (deflated 43%)
2025-12-04T12:25:14.3069534Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.xml (deflated 43%)
2025-12-04T12:25:14.3070127Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.xml (deflated 44%)
2025-12-04T12:25:14.3070735Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.xml (deflated 36%)
2025-12-04T12:25:14.3071329Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.xml (deflated 35%)
2025-12-04T12:25:14.3071931Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.xml (deflated 44%)
2025-12-04T12:25:14.3072527Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.xml (deflated 42%)
2025-12-04T12:25:14.3073129Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.xml (deflated 47%)
2025-12-04T12:25:14.3073724Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.xml (deflated 47%)
2025-12-04T12:25:14.3074359Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.xml (deflated 56%)
2025-12-04T12:25:14.3074968Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.xml (deflated 35%)
2025-12-04T12:25:14.3075572Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.xml (deflated 35%)
2025-12-04T12:25:14.3076176Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.xml (deflated 44%)
2025-12-04T12:25:14.3076776Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.xml (deflated 36%)
2025-12-04T12:25:14.3077373Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.xml (deflated 36%)
2025-12-04T12:25:14.3078032Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.xml (deflated 35%)
2025-12-04T12:25:14.3078663Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.xml (deflated 36%)
2025-12-04T12:25:14.3079269Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.xml (deflated 44%)
2025-12-04T12:25:14.3079868Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.xml (deflated 45%)
2025-12-04T12:25:14.3080468Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.xml (deflated 36%)
2025-12-04T12:25:14.3081058Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.xml (deflated 43%)
2025-12-04T12:25:14.3081660Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.xml (deflated 41%)
2025-12-04T12:25:14.3082260Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.xml (deflated 41%)
2025-12-04T12:25:14.3082853Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.xml (deflated 36%)
2025-12-04T12:25:14.3083454Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.xml (deflated 35%)
2025-12-04T12:25:14.3084053Z   adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.xml (deflated 36%)
2025-12-04T12:25:14.3084663Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.xml (deflated 35%)
2025-12-04T12:25:14.3085279Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.xml (deflated 37%)
2025-12-04T12:25:14.3085880Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.xml (deflated 35%)
2025-12-04T12:25:14.3086486Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.xml (deflated 36%)
2025-12-04T12:25:14.3087085Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.xml (deflated 45%)
2025-12-04T12:25:14.3087720Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.xml (deflated 35%)
2025-12-04T12:25:14.3088320Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.xml (deflated 43%)
2025-12-04T12:25:14.3088921Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.xml (deflated 43%)
2025-12-04T12:25:14.3089529Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.xml (deflated 36%)
2025-12-04T12:25:14.3090131Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.xml (deflated 35%)
2025-12-04T12:25:14.3090739Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.xml (deflated 36%)
2025-12-04T12:25:14.3091387Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.xml (deflated 36%)
2025-12-04T12:25:14.3092010Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.xml (deflated 35%)
2025-12-04T12:25:14.3092618Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.xml (deflated 36%)
2025-12-04T12:25:14.3093215Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.xml (deflated 44%)
2025-12-04T12:25:14.3093820Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.xml (deflated 44%)
2025-12-04T12:25:14.3094417Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.xml (deflated 44%)
2025-12-04T12:25:14.3095029Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.xml (deflated 44%)
2025-12-04T12:25:14.3095634Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.xml (deflated 36%)
2025-12-04T12:25:14.3096232Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.xml (deflated 35%)
2025-12-04T12:25:14.3097098Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.xml (deflated 35%)
2025-12-04T12:25:14.3097774Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.xml (deflated 35%)
2025-12-04T12:25:14.3098460Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.xml (deflated 36%)
2025-12-04T12:25:14.3099145Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.xml (deflated 43%)
2025-12-04T12:25:14.3099827Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.xml (deflated 44%)
2025-12-04T12:25:14.3100540Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.xml (deflated 36%)
2025-12-04T12:25:14.3101224Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.xml (deflated 36%)
2025-12-04T12:25:14.3101901Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.xml (deflated 35%)
2025-12-04T12:25:14.3102624Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.xml (deflated 57%)
2025-12-04T12:25:14.3103318Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.xml (deflated 36%)
2025-12-04T12:25:14.3103997Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.xml (deflated 36%)
2025-12-04T12:25:14.3104682Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.xml (deflated 36%)
2025-12-04T12:25:14.3105363Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.xml (deflated 41%)
2025-12-04T12:25:14.3106055Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.xml (deflated 35%)
2025-12-04T12:25:14.3106786Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.xml (deflated 35%)
2025-12-04T12:25:14.3107493Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.xml (deflated 37%)
2025-12-04T12:25:14.3108182Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.xml (deflated 36%)
2025-12-04T12:25:14.3108857Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.xml (deflated 35%)
2025-12-04T12:25:14.3109639Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.xml (deflated 36%)
2025-12-04T12:25:14.3110283Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.xml (deflated 35%)
2025-12-04T12:25:14.3110923Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.xml (deflated 35%)
2025-12-04T12:25:14.3111566Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.xml (deflated 36%)
2025-12-04T12:25:14.3112208Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.xml (deflated 36%)
2025-12-04T12:25:14.3112854Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.xml (deflated 36%)
2025-12-04T12:25:14.3113489Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.xml (deflated 43%)
2025-12-04T12:25:14.3114146Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.xml (deflated 36%)
2025-12-04T12:25:14.3114784Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.xml (deflated 36%)
2025-12-04T12:25:14.3115419Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.xml (deflated 43%)
2025-12-04T12:25:14.3116059Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.xml (deflated 44%)
2025-12-04T12:25:14.3116695Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.xml (deflated 43%)
2025-12-04T12:25:14.3117367Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.xml (deflated 42%)
2025-12-04T12:25:14.3118005Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.xml (deflated 36%)
2025-12-04T12:25:14.3118646Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.xml (deflated 47%)
2025-12-04T12:25:14.3119295Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.xml (deflated 36%)
2025-12-04T12:25:14.3119929Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.xml (deflated 36%)
2025-12-04T12:25:14.3120574Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.xml (deflated 36%)
2025-12-04T12:25:14.3121696Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.xml (deflated 43%)
2025-12-04T12:25:14.3122447Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.xml (deflated 35%)
2025-12-04T12:25:14.3123138Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.xml (deflated 36%)
2025-12-04T12:25:14.3123828Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.xml (deflated 36%)
2025-12-04T12:25:14.3124508Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.xml (deflated 35%)
2025-12-04T12:25:14.3125189Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.xml (deflated 37%)
2025-12-04T12:25:14.3125885Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.xml (deflated 36%)
2025-12-04T12:25:14.3126567Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.xml (deflated 36%)
2025-12-04T12:25:14.3127259Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.xml (deflated 43%)
2025-12-04T12:25:14.3127938Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.xml (deflated 43%)
2025-12-04T12:25:14.3128623Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.xml (deflated 38%)
2025-12-04T12:25:14.3129317Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.xml (deflated 36%)
2025-12-04T12:25:14.3129998Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.xml (deflated 36%)
2025-12-04T12:25:14.3130685Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.xml (deflated 35%)
2025-12-04T12:25:14.3131371Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.xml (deflated 43%)
2025-12-04T12:25:14.3132063Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.xml (deflated 36%)
2025-12-04T12:25:14.3132788Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.xml (deflated 36%)
2025-12-04T12:25:14.3133565Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.xml (deflated 43%)
2025-12-04T12:25:14.3134290Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.xml (deflated 43%)
2025-12-04T12:25:14.3134894Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.xml (deflated 44%)
2025-12-04T12:25:14.3135507Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.xml (deflated 36%)
2025-12-04T12:25:14.3136111Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.xml (deflated 36%)
2025-12-04T12:25:14.3136949Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.xml (deflated 43%)
2025-12-04T12:25:14.3137739Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.xml (deflated 42%)
2025-12-04T12:25:14.3138421Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.xml (deflated 47%)
2025-12-04T12:25:14.3139100Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.xml (deflated 47%)
2025-12-04T12:25:14.3139779Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.xml (deflated 56%)
2025-12-04T12:25:14.3140468Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.xml (deflated 36%)
2025-12-04T12:25:14.3141150Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.xml (deflated 35%)
2025-12-04T12:25:14.3141832Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.xml (deflated 44%)
2025-12-04T12:25:14.3142525Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.xml (deflated 36%)
2025-12-04T12:25:14.3143198Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.xml (deflated 36%)
2025-12-04T12:25:14.3143889Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.xml (deflated 36%)
2025-12-04T12:25:14.3144567Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.xml (deflated 36%)
2025-12-04T12:25:14.3145253Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.xml (deflated 44%)
2025-12-04T12:25:14.3145947Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.xml (deflated 45%)
2025-12-04T12:25:14.3146625Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.xml (deflated 36%)
2025-12-04T12:25:14.3147316Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.xml (deflated 42%)
2025-12-04T12:25:14.3147995Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.xml (deflated 41%)
2025-12-04T12:25:14.3148715Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.xml (deflated 42%)
2025-12-04T12:25:14.3149453Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.xml (deflated 36%)
2025-12-04T12:25:14.3150052Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.xml (deflated 35%)
2025-12-04T12:25:14.3150667Z   adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.xml (deflated 36%)
2025-12-04T12:25:14.3169914Z ##[group]Run # Remove any previous usage logs if they exist
2025-12-04T12:25:14.3170107Z [36;1m# Remove any previous usage logs if they exist[0m
2025-12-04T12:25:14.3170220Z [36;1mrm -f logs-*.zip[0m
2025-12-04T12:25:14.3170405Z [36;1mzip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true[0m
2025-12-04T12:25:14.3170651Z [36;1mzip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true[0m
2025-12-04T12:25:14.3176297Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:14.3176485Z env:
2025-12-04T12:25:14.3176602Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.3176695Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.3177045Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.3177379Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.3177696Z   FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T12:25:14.3177806Z ##[endgroup]
2025-12-04T12:25:14.3230733Z   adding: usage_log.txt (deflated 58%)
2025-12-04T12:25:14.3321020Z   adding: test/test-reports/distributed.test_c10d_functional_native_1.1_5ceb4f282067967e_.log (deflated 85%)
2025-12-04T12:25:14.3324561Z   adding: test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log (deflated 90%)
2025-12-04T12:25:14.3336197Z   adding: test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log (deflated 96%)
2025-12-04T12:25:14.3337910Z   adding: test/test-reports/distributed.tensor.debug.test_debug_mode_1.1_8a4ec9b51bad1d98_.log (deflated 81%)
2025-12-04T12:25:14.3378949Z   adding: test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log (deflated 97%)
2025-12-04T12:25:14.3425166Z   adding: test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log (deflated 97%)
2025-12-04T12:25:14.3453609Z   adding: test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log (deflated 96%)
2025-12-04T12:25:14.3631848Z   adding: test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log (deflated 96%)
2025-12-04T12:25:14.3632793Z   adding: test/test-reports/distributed.algorithms.test_join_1.1_8f0ad2e1263a10f0_.log (deflated 84%)
2025-12-04T12:25:14.3635474Z   adding: test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_3173a38c7a75b752_.log (deflated 90%)
2025-12-04T12:25:14.3637255Z   adding: test/test-reports/distributed.test_compute_comm_reordering_1.1_7c582fe21d8b6d0b_.log (deflated 86%)
2025-12-04T12:25:14.3637681Z   adding: test/test-reports/distributed.test_cupy_as_tensor_1.1_01ccc395c80cccfc_.log (deflated 53%)
2025-12-04T12:25:14.3638235Z   adding: test/test-reports/distributed.fsdp.test_fsdp_fx_1.1_5233411b5b9ade93_.log (deflated 53%)
2025-12-04T12:25:14.3638928Z   adding: test/test-reports/distributed._tools.test_sac_ilp_1.1_aac1d3e83d5577ad_.log (deflated 61%)
2025-12-04T12:25:14.3639645Z   adding: test/test-reports/distributed.checkpoint.test_hf_storage_1.1_ec1da04f72df0c46_.log (deflated 67%)
2025-12-04T12:25:14.3640484Z   adding: test/test-reports/distributed.pipelining.test_microbatch_1.1_e0b58af1802f4b06_.log (deflated 68%)
2025-12-04T12:25:14.3641104Z   adding: test/test-reports/distributed.tensor.test_placement_types_1.1_c7b4602e70c3b07a_.log (deflated 68%)
2025-12-04T12:25:14.3641806Z   adding: test/test-reports/distributed.tensor.test_dtensor_dispatch_overhead_1.1_85c49e7d8275b78b_.log (deflated 63%)
2025-12-04T12:25:14.3642190Z   adding: test/test-reports/distributed.rpc.test_faulty_agent_1.1_9f30efe05bf109e0_.log (stored 0%)
2025-12-04T12:25:14.3642800Z   adding: test/test-reports/distributed.checkpoint._experimental.test_checkpoint_reader_1.1_68c37a9fa1601552_.log (deflated 74%)
2025-12-04T12:25:14.3643945Z   adding: test/test-reports/distributed.checkpoint.test_format_utils_1.1_04ae55b8cdf477fd_.log (deflated 80%)
2025-12-04T12:25:14.3648868Z   adding: test/test-reports/distributed.test_aten_comm_compute_reordering_1.2_69f8c7d62333ccaf_.log (deflated 93%)
2025-12-04T12:25:14.3653091Z   adding: test/test-reports/distributed.tensor.test_redistribute_2.2_51e2d05d075503bf_.log (deflated 91%)
2025-12-04T12:25:14.3655274Z   adding: test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_54e71dcd4ed048eb_.log (deflated 86%)
2025-12-04T12:25:14.3656679Z   adding: test/test-reports/distributed.tensor.test_api_1.1_f4574b86db79cb55_.log (deflated 85%)
2025-12-04T12:25:14.3659155Z   adding: test/test-reports/distributed.checkpoint.test_fsspec_1.1_8eaa241efddb416a_.log (deflated 86%)
2025-12-04T12:25:14.3659980Z   adding: test/test-reports/distributed.tensor.experimental.test_tp_transform_1.1_d11081dcea691eaf_.log (deflated 84%)
2025-12-04T12:25:14.3660662Z   adding: test/test-reports/distributed.checkpoint.test_traverse_1.1_eea2c84c34471245_.log (deflated 71%)
2025-12-04T12:25:14.3663434Z   adding: test/test-reports/distributed.tensor.test_random_ops_1.1_b2ded413b82ba64f_.log (deflated 88%)
2025-12-04T12:25:14.3665334Z   adding: test/test-reports/distributed._shard.sharded_tensor.ops.test_embedding_1.1_94d647ccb113bbd0_.log (deflated 91%)
2025-12-04T12:25:14.3665840Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_logging_1.1_334cd8181d21220c_.log (deflated 53%)
2025-12-04T12:25:14.3666362Z   adding: test/test-reports/distributed.launcher.test_api_1.1_4a83e51b1f3b8245_.log (deflated 58%)
2025-12-04T12:25:14.3667175Z   adding: test/test-reports/distributed.elastic.multiprocessing.test_api_1.1_4bf04d2a67164589_.log (deflated 72%)
2025-12-04T12:25:14.3667942Z   adding: test/test-reports/distributed.fsdp.test_shard_utils_1.1_4e12f3568c69a797_.log (deflated 67%)
2025-12-04T12:25:14.3672323Z   adding: test/test-reports/distributed.checkpoint.test_fsdp_optim_state_1.1_d25d2159eaa83e63_.log (deflated 96%)
2025-12-04T12:25:14.3680237Z   adding: test/test-reports/distributed.checkpoint.e2e.test_e2e_save_and_load_1.1_4cbd59f9e8ee7ec0_.log (deflated 93%)
2025-12-04T12:25:14.3682505Z   adding: test/test-reports/distributed.checkpoint.test_dtensor_resharding_1.1_a0990bee4dfbe749_.log (deflated 91%)
2025-12-04T12:25:14.3683269Z   adding: test/test-reports/distributed.fsdp.test_fsdp_memory_1.1_ac8e61e17ebeaaa5_.log (deflated 75%)
2025-12-04T12:25:14.3684541Z   adding: test/test-reports/distributed.tensor.test_pointwise_ops_1.1_fc7ea695ae4d24dd_.log (deflated 77%)
2025-12-04T12:25:14.3685085Z   adding: test/test-reports/distributed.checkpoint.test_compatibility_1.1_995845a47bb8bc7e_.log (deflated 65%)
2025-12-04T12:25:14.3685673Z   adding: test/test-reports/distributed._tools.test_mem_tracker_1.1_c5962f3ebcf85955_.log (deflated 61%)
2025-12-04T12:25:14.3686612Z   adding: test/test-reports/distributed.elastic.test_control_plane_1.1_74d942263f51456c_.log (deflated 77%)
2025-12-04T12:25:14.3687350Z   adding: test/test-reports/distributed.test_fake_pg_1.1_ecf9a296b2457f78_.log (deflated 75%)
2025-12-04T12:25:14.3690334Z   adding: test/test-reports/distributed.checkpoint.test_fsdp_model_state_1.1_0d5362771b48c12a_.log (deflated 94%)
2025-12-04T12:25:14.3691882Z   adding: test/test-reports/distributed.test_functional_api_1.1_d60bb00edf6e8a81_.log (deflated 84%)
2025-12-04T12:25:14.3692681Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_clip_grad_norm__1.1_76ba1390d272d622_.log (deflated 69%)
2025-12-04T12:25:14.3693350Z   adding: test/test-reports/distributed.tensor.debug.test_comm_mode_1.1_40ca723c6c817b86_.log (deflated 62%)
2025-12-04T12:25:14.3696451Z   adding: test/test-reports/distributed.test_dist2_1.1_cc2e2f70acaf1086_.log (deflated 88%)
2025-12-04T12:25:14.3697492Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_grad_scaler_1.1_5aa2313403ba4568_.log (deflated 61%)
2025-12-04T12:25:14.3700595Z   adding: test/test-reports/distributed.launcher.test_run_1.1_b22d13de769d84ff_.log (deflated 89%)
2025-12-04T12:25:14.3701575Z   adding: test/test-reports/distributed.fsdp.test_fsdp_backward_prefetch_1.1_29df4062c54c1e1a_.log (deflated 66%)
2025-12-04T12:25:14.3704027Z   adding: test/test-reports/distributed.checkpoint.test_checkpoint_1.1_d7eb3fb6652ade87_.log (deflated 91%)
2025-12-04T12:25:14.3704563Z   adding: test/test-reports/distributed._pycute.test_coalesce_1.1_b9854b582e22535e_.log (deflated 53%)
2025-12-04T12:25:14.3705062Z   adding: test/test-reports/distributed._pycute.test_complement_1.1_ccd05958479ced51_.log (deflated 54%)
2025-12-04T12:25:14.3705754Z   adding: test/test-reports/distributed._pycute.test_composition_1.1_6a9f660c56ddbb95_.log (deflated 54%)
2025-12-04T12:25:14.3706444Z   adding: test/test-reports/distributed._pycute.test_int_tuple_1.1_1b6829b59a3a12af_.log (deflated 75%)
2025-12-04T12:25:14.3706978Z   adding: test/test-reports/distributed._pycute.test_left_inverse_1.1_e810fe2e4745b377_.log (deflated 54%)
2025-12-04T12:25:14.3707644Z   adding: test/test-reports/distributed._pycute.test_right_inverse_1.1_c9aa035dc9548e77_.log (deflated 54%)
2025-12-04T12:25:14.3709455Z   adding: test/test-reports/distributed._composable.test_replicate_1.1_ede2d02b7e8a4250_.log (deflated 89%)
2025-12-04T12:25:14.3712487Z   adding: test/test-reports/distributed.checkpoint.test_hsdp_checkpoint_1.1_38b6379e9fe79671_.log (deflated 94%)
2025-12-04T12:25:14.3715061Z   adding: test/test-reports/distributed.tensor.parallel.test_parallelize_api_1.1_a79c3b02a80366e9_.log (deflated 88%)
2025-12-04T12:25:14.3738711Z   adding: test/test-reports/distributed.fsdp.test_fsdp_state_dict_1.2_f864b6fe160d675b_.log (deflated 97%)
2025-12-04T12:25:14.3739180Z   adding: test/test-reports/distributed._pycute.test_typing_1.1_70d9a252095d6a68_.log (deflated 53%)
2025-12-04T12:25:14.3739598Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_8732ec05eb19aa05_.log (deflated 12%)
2025-12-04T12:25:14.3740024Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_28ca104a37c9a833_.log (deflated 12%)
2025-12-04T12:25:14.3740448Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_4a0940f8014b8eef_.log (deflated 83%)
2025-12-04T12:25:14.3741143Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_dc17769dd5c2239f_.log (deflated 83%)
2025-12-04T12:25:14.3748478Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_3cbdf0379e4c6767_.log (deflated 93%)
2025-12-04T12:25:14.3755513Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_25c7f8918b3d0b51_.log (deflated 93%)
2025-12-04T12:25:14.3763570Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_6f55519eb0301937_.log (deflated 94%)
2025-12-04T12:25:14.3771621Z   adding: test/test-reports/distributed.test_distributed_spawn_1.9_c42c9aaca0d3f434_.log (deflated 94%)
2025-12-04T12:25:14.3772202Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_cfb55a01555794b3_.log (deflated 12%)
2025-12-04T12:25:14.3772599Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_5d1f467e5bbdaff2_.log (deflated 12%)
2025-12-04T12:25:14.3773009Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_b5a10ee12046d5b9_.log (deflated 82%)
2025-12-04T12:25:14.3773582Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_de48cc4d8d8e3c13_.log (deflated 82%)
2025-12-04T12:25:14.3781231Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_5fb338ab863a3c8f_.log (deflated 93%)
2025-12-04T12:25:14.3788652Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_024341bf790fe69a_.log (deflated 93%)
2025-12-04T12:25:14.3798199Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_807ef3b254ee9578_.log (deflated 94%)
2025-12-04T12:25:14.3807699Z   adding: test/test-reports/distributed.test_distributed_spawn_4.9_a98bc48b8a2bbb0a_.log (deflated 94%)
2025-12-04T12:25:14.3808269Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_e6318e4f5e3f044b_.log (deflated 12%)
2025-12-04T12:25:14.3808674Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_7d14db48d459fad6_.log (deflated 12%)
2025-12-04T12:25:14.3809072Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_867e6ca715844bef_.log (deflated 82%)
2025-12-04T12:25:14.3809595Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_e3e9b753abf00510_.log (deflated 82%)
2025-12-04T12:25:14.3816968Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_57c28f64236fb5f7_.log (deflated 93%)
2025-12-04T12:25:14.3824825Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_e15417bf2d6aa02d_.log (deflated 93%)
2025-12-04T12:25:14.3832937Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_7faf7d03bb4df9a2_.log (deflated 94%)
2025-12-04T12:25:14.3840598Z   adding: test/test-reports/distributed.test_distributed_spawn_7.9_99251297b874e698_.log (deflated 94%)
2025-12-04T12:25:14.3841284Z   adding: test/test-reports/distributed.test_serialization_1.1_13a719996bf7ed77_.log (deflated 73%)
2025-12-04T12:25:14.3843103Z   adding: test/test-reports/distributed.fsdp.test_fsdp_ignored_modules_1.1_10f1fa8ebe15ff14_.log (deflated 84%)
2025-12-04T12:25:14.3918011Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_comm_1.1_365cd7de0daee87d_.log (deflated 95%)
2025-12-04T12:25:14.3921327Z   adding: test/test-reports/distributed.fsdp.test_fsdp_sharded_grad_scaler_1.1_be49dd131ba0d1a6_.log (deflated 95%)
2025-12-04T12:25:14.3922769Z   adding: test/test-reports/distributed._shard.sharding_plan.test_sharding_plan_1.1_abd5760a3cc4b6ac_.log (deflated 88%)
2025-12-04T12:25:14.3924582Z   adding: test/test-reports/distributed._shard.sharded_optim.test_sharded_optim_1.1_eb895e054ba35bc4_.log (deflated 91%)
2025-12-04T12:25:14.3925891Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_state_dict_1.1_b527545a7e0cfc76_.log (deflated 84%)
2025-12-04T12:25:14.3930029Z   adding: test/test-reports/distributed.tensor.test_utils_1.1_adf864a1b1c1212f_.log (deflated 93%)
2025-12-04T12:25:14.3930880Z   adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_memory_1.1_49e4cc8ab7bdec96_.log (deflated 64%)
2025-12-04T12:25:14.3961961Z   adding: test/test-reports/distributed.checkpoint.test_state_dict_1.1_211422b52eb9ecc9_.log (deflated 98%)
2025-12-04T12:25:14.3962490Z   adding: test/test-reports/distributed.checkpoint.test_state_dict_utils_1.1_53a76f3501a79ced_.log (deflated 85%)
2025-12-04T12:25:14.3963586Z   adding: test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_reshard_1.1_41e70f878ccc4095_.log (deflated 86%)
2025-12-04T12:25:14.3965415Z   adding: test/test-reports/distributed.test_c10d_spawn_nccl_1.1_1bf221cec02d55ca_.log (deflated 91%)
2025-12-04T12:25:14.3966329Z   adding: test/test-reports/distributed.test_c10d_spawn_ucc_1.1_5521268884e60126_.log (deflated 90%)
2025-12-04T12:25:14.4000546Z   adding: test/test-reports/distributed.test_c10d_gloo_1.2_d5d0e2b1d744a982_.log (deflated 96%)
2025-12-04T12:25:14.4018922Z   adding: test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_1.1_24bd8bcdd0ba69c1_.log (deflated 96%)
2025-12-04T12:25:14.4037273Z   adding: test/test-reports/distributed.test_c10d_nccl_3.3_41c01794b25a1cc6_.log (deflated 92%)
2025-12-04T12:25:14.4062171Z ##[group]Run # Remove any previous debugging artifacts if they exist
2025-12-04T12:25:14.4062405Z [36;1m# Remove any previous debugging artifacts if they exist[0m
2025-12-04T12:25:14.4062617Z [36;1mrm -f debug-*.zip[0m
2025-12-04T12:25:14.4062751Z [36;1mif [ -d 'test/debug' ]; then[0m
2025-12-04T12:25:14.4062918Z [36;1m  zip -r "debug-${FILE_SUFFIX}.zip" test/debug[0m
2025-12-04T12:25:14.4063017Z [36;1mfi[0m
2025-12-04T12:25:14.4068820Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:14.4068924Z env:
2025-12-04T12:25:14.4069039Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.4069263Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.4069437Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.4069734Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.4070028Z   FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904
2025-12-04T12:25:14.4070113Z ##[endgroup]
2025-12-04T12:25:14.4150057Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T12:25:14.4150142Z with:
2025-12-04T12:25:14.4150243Z   s3-bucket: gha-artifacts
2025-12-04T12:25:14.4150405Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T12:25:14.4150500Z   retention-days: 14
2025-12-04T12:25:14.4150612Z   if-no-files-found: warn
2025-12-04T12:25:14.4150712Z   path: test-jsons-*.zip
2025-12-04T12:25:14.4150912Z   name: artifact
2025-12-04T12:25:14.4151061Z   region: us-east-1
2025-12-04T12:25:14.4151143Z env:
2025-12-04T12:25:14.4151240Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.4151345Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.4151510Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.4151807Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.4151907Z ##[endgroup]
2025-12-04T12:25:14.7856696Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T12:25:14.7857754Z With the provided path, there will be 1 file uploaded
2025-12-04T12:25:14.7858389Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T12:25:14.7899472Z Starting upload of test-jsons-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:14.9376797Z Finished upload of test-jsons-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:14.9545958Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T12:25:14.9546339Z with:
2025-12-04T12:25:14.9546591Z   s3-bucket: gha-artifacts
2025-12-04T12:25:14.9546964Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T12:25:14.9547364Z   retention-days: 14
2025-12-04T12:25:14.9547661Z   if-no-files-found: error
2025-12-04T12:25:14.9547986Z   path: test-reports-*.zip
2025-12-04T12:25:14.9548289Z   name: artifact
2025-12-04T12:25:14.9548542Z   region: us-east-1
2025-12-04T12:25:14.9548910Z env:
2025-12-04T12:25:14.9549261Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:14.9549518Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:14.9549850Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:14.9550426Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:14.9550936Z ##[endgroup]
2025-12-04T12:25:15.2982840Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T12:25:15.2983399Z With the provided path, there will be 1 file uploaded
2025-12-04T12:25:15.2983964Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T12:25:15.3025921Z Starting upload of test-reports-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:15.4518021Z Finished upload of test-reports-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:15.4691523Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T12:25:15.4691864Z with:
2025-12-04T12:25:15.4692102Z   s3-bucket: gha-artifacts
2025-12-04T12:25:15.4692438Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T12:25:15.4692820Z   retention-days: 14
2025-12-04T12:25:15.4693074Z   if-no-files-found: ignore
2025-12-04T12:25:15.4693356Z   path: logs-*.zip
2025-12-04T12:25:15.4693597Z   name: artifact
2025-12-04T12:25:15.4693943Z   region: us-east-1
2025-12-04T12:25:15.4694178Z env:
2025-12-04T12:25:15.4694398Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:15.4694665Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:15.4695013Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:15.4695603Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:15.4696126Z ##[endgroup]
2025-12-04T12:25:15.8124846Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T12:25:15.8125380Z With the provided path, there will be 1 file uploaded
2025-12-04T12:25:15.8125908Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T12:25:15.8168536Z Starting upload of logs-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:15.9717742Z Finished upload of logs-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip
2025-12-04T12:25:15.9892073Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T12:25:15.9892442Z with:
2025-12-04T12:25:15.9892691Z   s3-bucket: gha-artifacts
2025-12-04T12:25:15.9893047Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T12:25:15.9893420Z   retention-days: 14
2025-12-04T12:25:15.9893859Z   if-no-files-found: ignore
2025-12-04T12:25:15.9894232Z   path: debug-*.zip
2025-12-04T12:25:15.9894475Z   name: artifact
2025-12-04T12:25:15.9894726Z   region: us-east-1
2025-12-04T12:25:15.9894974Z env:
2025-12-04T12:25:15.9895203Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:15.9895482Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:15.9895832Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:15.9896575Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:15.9897302Z ##[endgroup]
2025-12-04T12:25:16.3255482Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded.
2025-12-04T12:25:16.3438184Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T12:25:16.3438754Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T12:25:16.3439498Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T12:25:16.3445917Z shell: /usr/bin/bash -e {0}
2025-12-04T12:25:16.3446264Z env:
2025-12-04T12:25:16.3446564Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:16.3447002Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:16.3447386Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:16.3448050Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:16.3448725Z ##[endgroup]
2025-12-04T12:25:16.6549878Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a
2025-12-04T12:25:16.6550376Z with:
2025-12-04T12:25:16.6550742Z   name: coredumps-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu
2025-12-04T12:25:16.6551201Z   retention-days: 14
2025-12-04T12:25:16.6551456Z   if-no-files-found: ignore
2025-12-04T12:25:16.6551737Z   path: ./**/core.[1-9]*
2025-12-04T12:25:16.6552020Z   s3-bucket: gha-artifacts
2025-12-04T12:25:16.6552279Z   region: us-east-1
2025-12-04T12:25:16.6552514Z env:
2025-12-04T12:25:16.6552729Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:16.6553002Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:16.6553321Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:16.6553904Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:16.6554426Z ##[endgroup]
2025-12-04T12:25:24.2262062Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T12:25:24.2494250Z Prepare all required actions
2025-12-04T12:25:24.2494652Z Getting action download info
2025-12-04T12:25:24.4261238Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548)
2025-12-04T12:25:24.8893517Z ##[group]Run ./.github/actions/upload-utilization-stats
2025-12-04T12:25:24.8893943Z with:
2025-12-04T12:25:24.8894272Z   job_id: 57116084904
2025-12-04T12:25:24.8894919Z   job_name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check)
2025-12-04T12:25:24.8895635Z   workflow_name: trunk
2025-12-04T12:25:24.8895928Z   workflow_run_id: 19922768520
2025-12-04T12:25:24.8896234Z   workflow_attempt: 1
2025-12-04T12:25:24.8896627Z env:
2025-12-04T12:25:24.8897037Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:24.8897330Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:24.8897786Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:24.8898490Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:24.8899057Z ##[endgroup]
2025-12-04T12:25:24.8942466Z ##[group]Run actions/setup-python@v6
2025-12-04T12:25:24.8942820Z with:
2025-12-04T12:25:24.8943060Z   python-version: 3.10
2025-12-04T12:25:24.8943358Z   check-latest: false
2025-12-04T12:25:24.8943763Z   token: ***
2025-12-04T12:25:24.8944037Z   update-environment: true
2025-12-04T12:25:24.8944359Z   allow-prereleases: false
2025-12-04T12:25:24.8944678Z   freethreaded: false
2025-12-04T12:25:24.8944955Z env:
2025-12-04T12:25:24.8945186Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:24.8945489Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:24.8945939Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:24.8966609Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:24.8967198Z ##[endgroup]
2025-12-04T12:25:25.0513055Z ##[group]Installed versions
2025-12-04T12:25:25.0522723Z Version 3.10 was not found in the local cache
2025-12-04T12:25:25.0711520Z (node:434397) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
2025-12-04T12:25:25.0712447Z (Use `node --trace-deprecation ...` to show where the warning was created)
2025-12-04T12:25:25.4181618Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system.
The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2025-12-04T12:25:25.4342083Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-12-04T12:25:25.4342590Z with:
2025-12-04T12:25:25.4342833Z env:
2025-12-04T12:25:25.4343076Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:25.4343374Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:25.4343744Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:25.4344402Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:25.4344968Z ##[endgroup]
2025-12-04T12:25:25.4362013Z ##[group]Run set -eou pipefail
2025-12-04T12:25:25.4362348Z [36;1mset -eou pipefail[0m
2025-12-04T12:25:25.4362610Z [36;1m[0m
2025-12-04T12:25:25.4362994Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2025-12-04T12:25:25.4363481Z [36;1mfor _ in $(seq 1440); do[0m
2025-12-04T12:25:25.4363830Z [36;1m    # Break if no ssh session exists anymore[0m
2025-12-04T12:25:25.4364189Z [36;1m    if [ "$(who)" = "" ]; then[0m
2025-12-04T12:25:25.4364543Z [36;1m      break[0m
2025-12-04T12:25:25.4364773Z [36;1m    fi[0m
2025-12-04T12:25:25.4365004Z [36;1m    echo "."[0m
2025-12-04T12:25:25.4365255Z [36;1m    sleep 5[0m
2025-12-04T12:25:25.4365484Z [36;1mdone[0m
2025-12-04T12:25:25.4371135Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:25.4371528Z env:
2025-12-04T12:25:25.4371752Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:25.4372018Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:25.4372347Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:25.4372931Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:25.4373439Z ##[endgroup]
2025-12-04T12:25:25.4399719Z Holding runner for 2 hours until all ssh sessions have logged out
2025-12-04T12:25:25.4480452Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T12:25:25.4481123Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T12:25:25.4481591Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T12:25:25.4481942Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T12:25:25.4482310Z [36;1m# Prune all of the docker images[0m
2025-12-04T12:25:25.4482646Z [36;1mdocker system prune -af[0m
2025-12-04T12:25:25.4488144Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:25.4488536Z env:
2025-12-04T12:25:25.4488763Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:25.4489037Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:25.4489366Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:25.4489943Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:25.4490466Z ##[endgroup]
2025-12-04T12:25:36.4602247Z 9f53f9c599eb
2025-12-04T12:25:37.0897028Z Deleted Containers:
2025-12-04T12:25:37.0897582Z 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:37.0898010Z 
2025-12-04T12:25:44.4573900Z Deleted Images:
2025-12-04T12:25:44.4574374Z untagged: public.ecr.aws/docker/library/python:3.13
2025-12-04T12:25:44.4575225Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T12:25:44.4576611Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a
2025-12-04T12:25:44.4577544Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd
2025-12-04T12:25:44.4578310Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67
2025-12-04T12:25:44.4579073Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c
2025-12-04T12:25:44.4579838Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212
2025-12-04T12:25:44.4580579Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318
2025-12-04T12:25:44.4581308Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560
2025-12-04T12:25:44.4582268Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee
2025-12-04T12:25:44.4583459Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T12:25:44.4585024Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T12:25:44.4586096Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301
2025-12-04T12:25:44.4586849Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301
2025-12-04T12:25:44.4587604Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4
2025-12-04T12:25:44.4588473Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b
2025-12-04T12:25:44.4589295Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124
2025-12-04T12:25:44.4590014Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca
2025-12-04T12:25:44.4590721Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b
2025-12-04T12:25:44.4591415Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d
2025-12-04T12:25:44.4592118Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a
2025-12-04T12:25:44.4592811Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2
2025-12-04T12:25:44.4593495Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e
2025-12-04T12:25:44.4594193Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40
2025-12-04T12:25:44.4594886Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31
2025-12-04T12:25:44.4595768Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca
2025-12-04T12:25:44.4596566Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1
2025-12-04T12:25:44.4597287Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c
2025-12-04T12:25:44.4598024Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c
2025-12-04T12:25:44.4598742Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7
2025-12-04T12:25:44.4599472Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b
2025-12-04T12:25:44.4600207Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce
2025-12-04T12:25:44.4600944Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c
2025-12-04T12:25:44.4601655Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452
2025-12-04T12:25:44.4602371Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f
2025-12-04T12:25:44.4603092Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4
2025-12-04T12:25:44.4603919Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13
2025-12-04T12:25:44.4604606Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877
2025-12-04T12:25:44.4605344Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751
2025-12-04T12:25:44.4606049Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b
2025-12-04T12:25:44.4606738Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447
2025-12-04T12:25:44.4607438Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d
2025-12-04T12:25:44.4608151Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d
2025-12-04T12:25:44.4608858Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882
2025-12-04T12:25:44.4609550Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f
2025-12-04T12:25:44.4610351Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d
2025-12-04T12:25:44.4611048Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d
2025-12-04T12:25:44.4611911Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33
2025-12-04T12:25:44.4612633Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae
2025-12-04T12:25:44.4613356Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e
2025-12-04T12:25:44.4614088Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc
2025-12-04T12:25:44.4614901Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2
2025-12-04T12:25:44.4615650Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc
2025-12-04T12:25:44.4616462Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300
2025-12-04T12:25:44.4617397Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a
2025-12-04T12:25:44.4618138Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a
2025-12-04T12:25:44.4618880Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19
2025-12-04T12:25:44.4619627Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd
2025-12-04T12:25:44.4620365Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b
2025-12-04T12:25:44.4621344Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33
2025-12-04T12:25:44.4622102Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f
2025-12-04T12:25:44.4622860Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1
2025-12-04T12:25:44.4623596Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e
2025-12-04T12:25:44.4624425Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c
2025-12-04T12:25:44.4625188Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89
2025-12-04T12:25:44.4625938Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c
2025-12-04T12:25:44.4626687Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de
2025-12-04T12:25:44.4627414Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d
2025-12-04T12:25:44.4628152Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed
2025-12-04T12:25:44.4628886Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f
2025-12-04T12:25:44.4629623Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5
2025-12-04T12:25:44.4630364Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3
2025-12-04T12:25:44.4631104Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883
2025-12-04T12:25:44.4631858Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15
2025-12-04T12:25:44.4632711Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c
2025-12-04T12:25:44.4633619Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33
2025-12-04T12:25:44.4634287Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3
2025-12-04T12:25:44.4634996Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c
2025-12-04T12:25:44.4635653Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255
2025-12-04T12:25:44.4636310Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292
2025-12-04T12:25:44.4636978Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844
2025-12-04T12:25:44.4637650Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2
2025-12-04T12:25:44.4638384Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd
2025-12-04T12:25:44.4639044Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1
2025-12-04T12:25:44.4639717Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff
2025-12-04T12:25:44.4640379Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315
2025-12-04T12:25:44.4641048Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b
2025-12-04T12:25:44.4641722Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450
2025-12-04T12:25:44.4642401Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1
2025-12-04T12:25:44.4643069Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5
2025-12-04T12:25:44.4643751Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1
2025-12-04T12:25:44.4644426Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024
2025-12-04T12:25:44.4645096Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e
2025-12-04T12:25:44.4645747Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032
2025-12-04T12:25:44.4646419Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe
2025-12-04T12:25:44.4647088Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c
2025-12-04T12:25:44.4647733Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b
2025-12-04T12:25:44.4648137Z 
2025-12-04T12:25:44.4648261Z Total reclaimed space: 36.15GB
2025-12-04T12:25:44.4684415Z ##[group]Run set +e
2025-12-04T12:25:44.4684805Z [36;1mset +e[0m
2025-12-04T12:25:44.4685054Z [36;1mset -x[0m
2025-12-04T12:25:44.4685365Z [36;1m[0m
2025-12-04T12:25:44.4685604Z [36;1mnvidia-smi[0m
2025-12-04T12:25:44.4686180Z [36;1m# NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in[0m
2025-12-04T12:25:44.4686978Z [36;1m# the case where the driver has already crashed as it still can get the driver version[0m
2025-12-04T12:25:44.4687753Z [36;1m# and some basic information like the bus ID.  However, the rest of the information[0m
2025-12-04T12:25:44.4688451Z [36;1m# would be missing (ERR!), for example:[0m
2025-12-04T12:25:44.4688898Z [36;1m#[0m
2025-12-04T12:25:44.4689219Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T12:25:44.4689782Z [36;1m# | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |[0m
2025-12-04T12:25:44.4690364Z [36;1m# |-------------------------------+----------------------+----------------------+[0m
2025-12-04T12:25:44.4690922Z [36;1m# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |[0m
2025-12-04T12:25:44.4691531Z [36;1m# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |[0m
2025-12-04T12:25:44.4692037Z [36;1m# |                               |                      |               MIG M. |[0m
2025-12-04T12:25:44.4692417Z [36;1m# |===============================+======================+======================|[0m
2025-12-04T12:25:44.4692902Z [36;1m# |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |[0m
2025-12-04T12:25:44.4693413Z [36;1m# |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |[0m
2025-12-04T12:25:44.4693875Z [36;1m# |                               |                      |                 ERR! |[0m
2025-12-04T12:25:44.4694312Z [36;1m# +-------------------------------+----------------------+----------------------+[0m
2025-12-04T12:25:44.4694714Z [36;1m#[0m
2025-12-04T12:25:44.4695030Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T12:25:44.4695514Z [36;1m# | Processes:                                                                  |[0m
2025-12-04T12:25:44.4696004Z [36;1m# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |[0m
2025-12-04T12:25:44.4696594Z [36;1m# |        ID   ID                                                   Usage      |[0m
2025-12-04T12:25:44.4697194Z [36;1m# |=============================================================================|[0m
2025-12-04T12:25:44.4697700Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T12:25:44.4698136Z [36;1m#[0m
2025-12-04T12:25:44.4698594Z [36;1m# This should be reported as a failure instead as it will guarantee to fail when[0m
2025-12-04T12:25:44.4699201Z [36;1m# Docker tries to run with --gpus all[0m
2025-12-04T12:25:44.4699565Z [36;1m#[0m
2025-12-04T12:25:44.4699994Z [36;1m# So, the correct check here is to query one of the missing piece of info like[0m
2025-12-04T12:25:44.4700618Z [36;1m# GPU name, so that the command can fail accordingly[0m
2025-12-04T12:25:44.4701195Z [36;1mnvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0[0m
2025-12-04T12:25:44.4701680Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T12:25:44.4701989Z [36;1m[0m
2025-12-04T12:25:44.4702500Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T12:25:44.4703257Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T12:25:44.4703949Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T12:25:44.4704541Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T12:25:44.4704925Z [36;1mfi[0m
2025-12-04T12:25:44.4705153Z [36;1m[0m
2025-12-04T12:25:44.4705824Z [36;1m# For runner with multiple GPUs, we also want to confirm that the number of GPUs are the[0m
2025-12-04T12:25:44.4706571Z [36;1m# power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails[0m
2025-12-04T12:25:44.4707242Z [36;1m# https://github.com/pytorch/test-infra/issues/4000[0m
2025-12-04T12:25:44.4707736Z [36;1mGPU_COUNT=$(nvidia-smi --list-gpus | wc -l)[0m
2025-12-04T12:25:44.4708153Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T12:25:44.4708463Z [36;1m[0m
2025-12-04T12:25:44.4709073Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T12:25:44.4709751Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T12:25:44.4710365Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T12:25:44.4710896Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T12:25:44.4711224Z [36;1mfi[0m
2025-12-04T12:25:44.4711445Z [36;1m[0m
2025-12-04T12:25:44.4711703Z [36;1m# Check the GPU count to be a power of 2[0m
2025-12-04T12:25:44.4712276Z [36;1mif [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then[0m
2025-12-04T12:25:44.4713064Z [36;1m  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..."[0m
2025-12-04T12:25:44.4713662Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T12:25:44.4714035Z [36;1mfi[0m
2025-12-04T12:25:44.4723799Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:44.4724248Z env:
2025-12-04T12:25:44.4724503Z   GIT_DEFAULT_BRANCH: main
2025-12-04T12:25:44.4724807Z   HAS_NVIDIA_GPU: true
2025-12-04T12:25:44.4725179Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T12:25:44.4725834Z   DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15
2025-12-04T12:25:44.4726411Z ##[endgroup]
2025-12-04T12:25:44.4754605Z + nvidia-smi
2025-12-04T12:25:44.5218063Z Thu Dec  4 12:25:44 2025       
2025-12-04T12:25:44.5218557Z +-----------------------------------------------------------------------------------------+
2025-12-04T12:25:44.5219198Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T12:25:44.5219825Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T12:25:44.5220458Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T12:25:44.5221363Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T12:25:44.5221915Z |                                         |                        |               MIG M. |
2025-12-04T12:25:44.5222323Z |=========================================+========================+======================|
2025-12-04T12:25:44.5878833Z |   0  Tesla T4                       On  |   00000000:00:1B.0 Off |                    0 |
2025-12-04T12:25:44.5879441Z | N/A   27C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T12:25:44.5879924Z |                                         |                        |                  N/A |
2025-12-04T12:25:44.5880425Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T12:25:44.5880966Z |   1  Tesla T4                       On  |   00000000:00:1C.0 Off |                    0 |
2025-12-04T12:25:44.5881470Z | N/A   26C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T12:25:44.5881928Z |                                         |                        |                  N/A |
2025-12-04T12:25:44.5882415Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T12:25:44.5882949Z |   2  Tesla T4                       On  |   00000000:00:1D.0 Off |                    0 |
2025-12-04T12:25:44.5883682Z | N/A   25C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T12:25:44.5884158Z |                                         |                        |                  N/A |
2025-12-04T12:25:44.5884713Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T12:25:44.5885230Z |   3  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
2025-12-04T12:25:44.5885746Z | N/A   26C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
2025-12-04T12:25:44.5886220Z |                                         |                        |                  N/A |
2025-12-04T12:25:44.5886705Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T12:25:44.5891414Z 
2025-12-04T12:25:44.5891737Z +-----------------------------------------------------------------------------------------+
2025-12-04T12:25:44.5892503Z | Processes:                                                                              |
2025-12-04T12:25:44.5893056Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T12:25:44.5893629Z |        ID   ID                                                               Usage      |
2025-12-04T12:25:44.5894044Z |=========================================================================================|
2025-12-04T12:25:44.5914946Z |  No running processes found                                                             |
2025-12-04T12:25:44.5915737Z +-----------------------------------------------------------------------------------------+
2025-12-04T12:25:45.2630653Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T12:25:45.2812292Z Tesla T4
2025-12-04T12:25:45.3086896Z + NVIDIA_SMI_STATUS=0
2025-12-04T12:25:45.3087309Z + '[' 0 -ne 0 ']'
2025-12-04T12:25:45.3092368Z ++ nvidia-smi --list-gpus
2025-12-04T12:25:45.3093948Z ++ wc -l
2025-12-04T12:25:45.3557742Z + GPU_COUNT=4
2025-12-04T12:25:45.3558041Z + NVIDIA_SMI_STATUS=0
2025-12-04T12:25:45.3558509Z + '[' 0 -ne 0 ']'
2025-12-04T12:25:45.3558863Z + '[' 4 -le 8 ']'
2025-12-04T12:25:45.3559114Z + '[' 4 -ne 1 ']'
2025-12-04T12:25:45.3559345Z + '[' 4 -ne 2 ']'
2025-12-04T12:25:45.3559599Z + '[' 4 -ne 4 ']'
2025-12-04T12:25:45.3644017Z Post job cleanup.
2025-12-04T12:25:45.3725670Z Post job cleanup.
2025-12-04T12:25:45.3773374Z Post job cleanup.
2025-12-04T12:25:45.4785224Z [command]/usr/bin/git version
2025-12-04T12:25:45.4825455Z git version 2.50.1
2025-12-04T12:25:45.4863003Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/53556366-c1b9-4fa8-82d6-046dda343be8/.gitconfig'
2025-12-04T12:25:45.4872140Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/53556366-c1b9-4fa8-82d6-046dda343be8' before making global git config changes
2025-12-04T12:25:45.4873208Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T12:25:45.4877477Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T12:25:45.4927167Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T12:25:45.4959307Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T12:25:45.5291118Z Entering 'android/libs/fbjni'
2025-12-04T12:25:45.5347783Z Entering 'third_party/FP16'
2025-12-04T12:25:45.5408368Z Entering 'third_party/FXdiv'
2025-12-04T12:25:45.5465841Z Entering 'third_party/NNPACK'
2025-12-04T12:25:45.5524039Z Entering 'third_party/NVTX'
2025-12-04T12:25:45.5583575Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T12:25:45.5641907Z Entering 'third_party/XNNPACK'
2025-12-04T12:25:45.5718283Z Entering 'third_party/aiter'
2025-12-04T12:25:45.5777971Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T12:25:45.5846017Z Entering 'third_party/benchmark'
2025-12-04T12:25:45.5905950Z Entering 'third_party/composable_kernel'
2025-12-04T12:25:45.5976806Z Entering 'third_party/cpp-httplib'
2025-12-04T12:25:45.6038380Z Entering 'third_party/cpuinfo'
2025-12-04T12:25:45.6097051Z Entering 'third_party/cudnn_frontend'
2025-12-04T12:25:45.6157866Z Entering 'third_party/cutlass'
2025-12-04T12:25:45.6229603Z Entering 'third_party/fbgemm'
2025-12-04T12:25:45.6291358Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T12:25:45.6347193Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T12:25:45.6415720Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T12:25:45.6478895Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T12:25:45.6546208Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T12:25:45.6602444Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T12:25:45.6667321Z Entering 'third_party/fbgemm/external/json'
2025-12-04T12:25:45.6730048Z Entering 'third_party/flash-attention'
2025-12-04T12:25:45.6792693Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T12:25:45.6855382Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T12:25:45.6927997Z Entering 'third_party/flatbuffers'
2025-12-04T12:25:45.6992991Z Entering 'third_party/fmt'
2025-12-04T12:25:45.7053075Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T12:25:45.7113316Z Entering 'third_party/gloo'
2025-12-04T12:25:45.7173059Z Entering 'third_party/googletest'
2025-12-04T12:25:45.7231156Z Entering 'third_party/ideep'
2025-12-04T12:25:45.7292334Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T12:25:45.7359927Z Entering 'third_party/ittapi'
2025-12-04T12:25:45.7418095Z Entering 'third_party/kineto'
2025-12-04T12:25:45.7482862Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T12:25:45.7538560Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T12:25:45.7599249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T12:25:45.7656981Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T12:25:45.7717701Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T12:25:45.7778640Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T12:25:45.7840938Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T12:25:45.7902245Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T12:25:45.7961635Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T12:25:45.8018643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T12:25:45.8076662Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T12:25:45.8132282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:45.8195453Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:45.8263333Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T12:25:45.8319985Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T12:25:45.8383644Z Entering 'third_party/kleidiai'
2025-12-04T12:25:45.8444646Z Entering 'third_party/mimalloc'
2025-12-04T12:25:45.8501091Z Entering 'third_party/nlohmann'
2025-12-04T12:25:45.8563307Z Entering 'third_party/onnx'
2025-12-04T12:25:45.8644063Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T12:25:45.8701635Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T12:25:45.8763271Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T12:25:45.8818510Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T12:25:45.8876378Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T12:25:45.8936041Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T12:25:45.8995519Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T12:25:45.9051973Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T12:25:45.9108009Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T12:25:45.9172715Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:45.9236376Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:45.9300444Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T12:25:45.9378599Z Entering 'third_party/pocketfft'
2025-12-04T12:25:45.9438514Z Entering 'third_party/protobuf'
2025-12-04T12:25:45.9499095Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T12:25:45.9556764Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T12:25:45.9614970Z Entering 'third_party/psimd'
2025-12-04T12:25:45.9675284Z Entering 'third_party/pthreadpool'
2025-12-04T12:25:45.9739370Z Entering 'third_party/pybind11'
2025-12-04T12:25:45.9797792Z Entering 'third_party/python-peachpy'
2025-12-04T12:25:45.9857812Z Entering 'third_party/sleef'
2025-12-04T12:25:45.9916294Z Entering 'third_party/tensorpipe'
2025-12-04T12:25:45.9976200Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T12:25:46.0039257Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T12:25:46.0095811Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T12:25:46.0155782Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T12:25:46.0211953Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T12:25:46.0295344Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T12:25:46.0319302Z http.https://github.com/.extraheader
2025-12-04T12:25:46.0328351Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T12:25:46.0360241Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T12:25:46.0679000Z Entering 'android/libs/fbjni'
2025-12-04T12:25:46.0718659Z http.https://github.com/.extraheader
2025-12-04T12:25:46.0760182Z Entering 'third_party/FP16'
2025-12-04T12:25:46.0799891Z http.https://github.com/.extraheader
2025-12-04T12:25:46.0837131Z Entering 'third_party/FXdiv'
2025-12-04T12:25:46.0877434Z http.https://github.com/.extraheader
2025-12-04T12:25:46.0921800Z Entering 'third_party/NNPACK'
2025-12-04T12:25:46.0962166Z http.https://github.com/.extraheader
2025-12-04T12:25:46.0996823Z Entering 'third_party/NVTX'
2025-12-04T12:25:46.1038126Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1075859Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T12:25:46.1115740Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1153449Z Entering 'third_party/XNNPACK'
2025-12-04T12:25:46.1193001Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1247252Z Entering 'third_party/aiter'
2025-12-04T12:25:46.1288197Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1323062Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T12:25:46.1362872Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1412158Z Entering 'third_party/benchmark'
2025-12-04T12:25:46.1452902Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1490898Z Entering 'third_party/composable_kernel'
2025-12-04T12:25:46.1531090Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1576693Z Entering 'third_party/cpp-httplib'
2025-12-04T12:25:46.1617507Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1653763Z Entering 'third_party/cpuinfo'
2025-12-04T12:25:46.1695460Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1735249Z Entering 'third_party/cudnn_frontend'
2025-12-04T12:25:46.1774962Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1815757Z Entering 'third_party/cutlass'
2025-12-04T12:25:46.1857980Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1908744Z Entering 'third_party/fbgemm'
2025-12-04T12:25:46.1948782Z http.https://github.com/.extraheader
2025-12-04T12:25:46.1988875Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T12:25:46.2031032Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2065391Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T12:25:46.2102519Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2148258Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T12:25:46.2187783Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2223433Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T12:25:46.2261259Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2307073Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T12:25:46.2345638Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2390740Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T12:25:46.2427595Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2468172Z Entering 'third_party/fbgemm/external/json'
2025-12-04T12:25:46.2507988Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2544883Z Entering 'third_party/flash-attention'
2025-12-04T12:25:46.2583076Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2619791Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T12:25:46.2657993Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2698494Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T12:25:46.2737965Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2782643Z Entering 'third_party/flatbuffers'
2025-12-04T12:25:46.2821676Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2861298Z Entering 'third_party/fmt'
2025-12-04T12:25:46.2900032Z http.https://github.com/.extraheader
2025-12-04T12:25:46.2937961Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T12:25:46.2978050Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3014781Z Entering 'third_party/gloo'
2025-12-04T12:25:46.3053813Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3093160Z Entering 'third_party/googletest'
2025-12-04T12:25:46.3131833Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3168968Z Entering 'third_party/ideep'
2025-12-04T12:25:46.3209040Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3243807Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T12:25:46.3281364Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3327994Z Entering 'third_party/ittapi'
2025-12-04T12:25:46.3368609Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3404306Z Entering 'third_party/kineto'
2025-12-04T12:25:46.3444009Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3479894Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T12:25:46.3517956Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3556286Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T12:25:46.3595064Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3636286Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T12:25:46.3673613Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3708649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T12:25:46.3748799Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3794124Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T12:25:46.3831395Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3867101Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T12:25:46.3906436Z http.https://github.com/.extraheader
2025-12-04T12:25:46.3944059Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T12:25:46.3982259Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4018658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T12:25:46.4057888Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4096669Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T12:25:46.4137403Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4179920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T12:25:46.4217940Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4255148Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T12:25:46.4293315Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4329311Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:46.4369393Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4410129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:46.4449859Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4491244Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T12:25:46.4538204Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4575323Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T12:25:46.4613110Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4650396Z Entering 'third_party/kleidiai'
2025-12-04T12:25:46.4690501Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4725786Z Entering 'third_party/mimalloc'
2025-12-04T12:25:46.4767027Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4803682Z Entering 'third_party/nlohmann'
2025-12-04T12:25:46.4844824Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4882635Z Entering 'third_party/onnx'
2025-12-04T12:25:46.4922743Z http.https://github.com/.extraheader
2025-12-04T12:25:46.4976966Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T12:25:46.5015459Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5056503Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T12:25:46.5096807Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5138549Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T12:25:46.5177302Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5213312Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T12:25:46.5251798Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5291358Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T12:25:46.5328557Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5364818Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T12:25:46.5402962Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5440518Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T12:25:46.5480333Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5516609Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T12:25:46.5555662Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5600462Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T12:25:46.5640421Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5674132Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:46.5712142Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5751363Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:46.5787869Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5836541Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T12:25:46.5873772Z http.https://github.com/.extraheader
2025-12-04T12:25:46.5938298Z Entering 'third_party/pocketfft'
2025-12-04T12:25:46.5977817Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6018554Z Entering 'third_party/protobuf'
2025-12-04T12:25:46.6058139Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6102235Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T12:25:46.6140731Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6179316Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T12:25:46.6217879Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6258632Z Entering 'third_party/psimd'
2025-12-04T12:25:46.6298013Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6340256Z Entering 'third_party/pthreadpool'
2025-12-04T12:25:46.6380611Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6419730Z Entering 'third_party/pybind11'
2025-12-04T12:25:46.6458400Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6496064Z Entering 'third_party/python-peachpy'
2025-12-04T12:25:46.6535602Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6572142Z Entering 'third_party/sleef'
2025-12-04T12:25:46.6611576Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6647162Z Entering 'third_party/tensorpipe'
2025-12-04T12:25:46.6687518Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6722406Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T12:25:46.6761550Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6796756Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T12:25:46.6836363Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6873700Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T12:25:46.6912798Z http.https://github.com/.extraheader
2025-12-04T12:25:46.6947292Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T12:25:46.6987208Z http.https://github.com/.extraheader
2025-12-04T12:25:46.7019810Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T12:25:46.7058110Z http.https://github.com/.extraheader
2025-12-04T12:25:46.7124858Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:46.7167938Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T12:25:46.7487456Z Entering 'android/libs/fbjni'
2025-12-04T12:25:46.7521309Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T12:25:46.7538396Z Entering 'third_party/FP16'
2025-12-04T12:25:46.7565459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T12:25:46.7581029Z Entering 'third_party/FXdiv'
2025-12-04T12:25:46.7609855Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T12:25:46.7625385Z Entering 'third_party/NNPACK'
2025-12-04T12:25:46.7653397Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T12:25:46.7672067Z Entering 'third_party/NVTX'
2025-12-04T12:25:46.7698667Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T12:25:46.7716797Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T12:25:46.7745577Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T12:25:46.7766929Z Entering 'third_party/XNNPACK'
2025-12-04T12:25:46.7794111Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T12:25:46.7830168Z Entering 'third_party/aiter'
2025-12-04T12:25:46.7855865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T12:25:46.7876084Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T12:25:46.7899440Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T12:25:46.7926235Z Entering 'third_party/benchmark'
2025-12-04T12:25:46.7954525Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T12:25:46.7973089Z Entering 'third_party/composable_kernel'
2025-12-04T12:25:46.8000327Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T12:25:46.8027072Z Entering 'third_party/cpp-httplib'
2025-12-04T12:25:46.8054390Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T12:25:46.8072932Z Entering 'third_party/cpuinfo'
2025-12-04T12:25:46.8098604Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T12:25:46.8118107Z Entering 'third_party/cudnn_frontend'
2025-12-04T12:25:46.8144528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T12:25:46.8163618Z Entering 'third_party/cutlass'
2025-12-04T12:25:46.8191643Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T12:25:46.8218913Z Entering 'third_party/fbgemm'
2025-12-04T12:25:46.8246697Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T12:25:46.8265375Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T12:25:46.8293128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T12:25:46.8312037Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T12:25:46.8338442Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T12:25:46.8365375Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T12:25:46.8391771Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T12:25:46.8410027Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T12:25:46.8437510Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T12:25:46.8464099Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T12:25:46.8491019Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T12:25:46.8506844Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T12:25:46.8534683Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T12:25:46.8553744Z Entering 'third_party/fbgemm/external/json'
2025-12-04T12:25:46.8578358Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T12:25:46.8600480Z Entering 'third_party/flash-attention'
2025-12-04T12:25:46.8626320Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T12:25:46.8645706Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T12:25:46.8672502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T12:25:46.8695867Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T12:25:46.8722473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T12:25:46.8751200Z Entering 'third_party/flatbuffers'
2025-12-04T12:25:46.8777904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T12:25:46.8799434Z Entering 'third_party/fmt'
2025-12-04T12:25:46.8825263Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T12:25:46.8844934Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T12:25:46.8872075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T12:25:46.8890141Z Entering 'third_party/gloo'
2025-12-04T12:25:46.8917919Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T12:25:46.8937949Z Entering 'third_party/googletest'
2025-12-04T12:25:46.8964648Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:46.8981053Z Entering 'third_party/ideep'
2025-12-04T12:25:46.9008715Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T12:25:46.9024040Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T12:25:46.9050836Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T12:25:46.9079078Z Entering 'third_party/ittapi'
2025-12-04T12:25:46.9103692Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T12:25:46.9122304Z Entering 'third_party/kineto'
2025-12-04T12:25:46.9152293Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T12:25:46.9170422Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T12:25:46.9197404Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T12:25:46.9214829Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T12:25:46.9242063Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T12:25:46.9259372Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T12:25:46.9286450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T12:25:46.9301941Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T12:25:46.9328165Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T12:25:46.9346258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T12:25:46.9373577Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T12:25:46.9390504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T12:25:46.9415503Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T12:25:46.9436389Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T12:25:46.9460401Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T12:25:46.9479421Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T12:25:46.9503084Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:46.9522144Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T12:25:46.9547223Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T12:25:46.9567160Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T12:25:46.9594427Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T12:25:46.9611993Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T12:25:46.9639263Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T12:25:46.9656569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:46.9683041Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T12:25:46.9700887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:46.9726840Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T12:25:46.9746823Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T12:25:46.9773752Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T12:25:46.9791985Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T12:25:46.9818020Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:46.9837946Z Entering 'third_party/kleidiai'
2025-12-04T12:25:46.9863078Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T12:25:46.9882604Z Entering 'third_party/mimalloc'
2025-12-04T12:25:46.9907250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T12:25:46.9925953Z Entering 'third_party/nlohmann'
2025-12-04T12:25:46.9954763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T12:25:46.9975047Z Entering 'third_party/onnx'
2025-12-04T12:25:47.0002085Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T12:25:47.0040796Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T12:25:47.0064904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T12:25:47.0086869Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T12:25:47.0115617Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T12:25:47.0136644Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T12:25:47.0162364Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T12:25:47.0178569Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T12:25:47.0204145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:47.0219468Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T12:25:47.0246796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T12:25:47.0261976Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T12:25:47.0290053Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T12:25:47.0305607Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T12:25:47.0331902Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T12:25:47.0347498Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T12:25:47.0375094Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T12:25:47.0394851Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T12:25:47.0418772Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T12:25:47.0436399Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T12:25:47.0459812Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T12:25:47.0480651Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T12:25:47.0505510Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T12:25:47.0524483Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T12:25:47.0552899Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T12:25:47.0592455Z Entering 'third_party/pocketfft'
2025-12-04T12:25:47.0617938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T12:25:47.0636565Z Entering 'third_party/protobuf'
2025-12-04T12:25:47.0661309Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T12:25:47.0683673Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T12:25:47.0707259Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T12:25:47.0724168Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T12:25:47.0753050Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:47.0773214Z Entering 'third_party/psimd'
2025-12-04T12:25:47.0800573Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T12:25:47.0818491Z Entering 'third_party/pthreadpool'
2025-12-04T12:25:47.0847257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T12:25:47.0862846Z Entering 'third_party/pybind11'
2025-12-04T12:25:47.0890660Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T12:25:47.0906985Z Entering 'third_party/python-peachpy'
2025-12-04T12:25:47.0937829Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T12:25:47.0955370Z Entering 'third_party/sleef'
2025-12-04T12:25:47.0979514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T12:25:47.0998829Z Entering 'third_party/tensorpipe'
2025-12-04T12:25:47.1024525Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T12:25:47.1043251Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T12:25:47.1067413Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T12:25:47.1085597Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T12:25:47.1109262Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T12:25:47.1127080Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T12:25:47.1155256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T12:25:47.1173209Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T12:25:47.1199181Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T12:25:47.1215215Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T12:25:47.1242904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T12:25:47.1280470Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1309634Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1337888Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1363379Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1391293Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1417579Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1445094Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1474851Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1500683Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1525933Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1551890Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1578450Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1603127Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1629173Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1654414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1679770Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1705049Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1730253Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1757080Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1781584Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1806719Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1832369Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1860135Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1882500Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1907572Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1932428Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1957795Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.1981957Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2007626Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2032946Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2059515Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2083545Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2109457Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2136966Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2162358Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2187961Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2212926Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2240730Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2266237Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2291796Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2318289Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2344229Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2369439Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2396109Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2421365Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2455695Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2482888Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2509371Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2539623Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2564499Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2591080Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2616262Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2642394Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2667304Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2692301Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2718170Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2743383Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2768156Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2794304Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2819142Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2844307Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2869362Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2894537Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2920352Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2946654Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2971690Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.2999500Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3026037Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3051242Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3079566Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3104864Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3130116Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3155840Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3180472Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3205075Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3230675Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3257573Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3282860Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3307859Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3333530Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3362157Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T12:25:47.3477199Z A job completed hook has been configured by the self-hosted runner administrator
2025-12-04T12:25:47.3492719Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh'
2025-12-04T12:25:47.3498589Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T12:25:47.3499043Z ##[endgroup]
2025-12-04T12:25:47.3585103Z [!ALERT!] Swap in detected! [!ALERT!]
2025-12-04T12:25:58.4578724Z [!ALERT!] Swap out detected [!ALERT!]
2025-12-04T12:26:17.0286486Z Cleaning up orphan processes